Abstract

Abstract Funding Acknowledgements Type of funding sources: Public grant(s) – National budget only. Main funding source(s): Ministry of Science and Higher Education Background Retrospective and observational analyses are an important part of cardiovascular research. The adoption of Electronic Health Records (EHR) improved the availability of medical documentation for research purposes, but rapid data collection is hampered by the predominantly unstructured nature of EHR. A significant proportion of data is textual information that cannot be utilized for research purposes until is manually coded into a database by a healthcare professional through manual chart review- this conventional method is both cost and time- consuming. Text mining could be utilized to accelerate the process, but little is known about attainable data accuracy and estimation of time gain. Purpose We developed a comprehensive text-processing tool for the cardiology field. The algorithm employs advanced text processing based on a specifically designed, vast database of medical terminology, drug lists and echocardiography parameters with a data structure tailored to the needs of cardiovascular research conduction. The algorithm can automatically analyze three types of textual data which are universal parts of discharge summary in Poland: (1) descriptive list of medical diagnoses; (2) discharge recommendations; (3) echocardiography report (if performed). A set of discharge summaries was analyzed with both the conventional method and the algorithm to demonstrate the process of acquisition of basic characteristics of the cohort of patients with atrial fibrillation/flutter (AF/AFl). Methods Discharge summaries of 394 patients hospitalized at one cardiology department were analyzed (1) automatically and (2) manually coded into the database by a healthcare professional utilizing a proprietary developed annotation tool to accelerate the annotation process, minimize errors and calculate total effective data acquisition time. Results The time of manual and automatic data analysis was 19:11 and 0:15 hours respectively. Of the total 394 patients, 319 (81%) had AF/AFl according to both manual identification and the algorithm (full agreement). The characteristics of the study group obtained with manual and automatic method is presented in Table. There was high agreement in the detection of diseases, presence of drug groups and echocardiographic parameters; some differences between the two classifications were noted but did not reach statistical significance. The median CHA2DS2VASc score calculated by the algorithm was 4 (IQR 3-5) versus 4 (IQR 3-5) for the manual method (p=0.92) and HAS-BLED score by the algorithm was 1 (IQR 1-2) versus 1 (IQR 1-2) for the manual method (p=0.98). Conclusions The utilization of the algorithm greatly reduced the time required for basic characteristics of the study group acquisition without significantly compromising the quality of the data. Automatic detection of retrospective study cohorts through the application of text processing techniques from electronic health records in Polish is promising and feasible.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call