Abstract

ObjectiveThis study aimed to validate trial patient eligibility screening and baseline data collection using text-mining in electronic healthcare records (EHRs), comparing the results to those of an international trial. Study Design and SettingIn three medical centers with different EHR vendors, EHR-based text-mining was used to automatically screen patients for trial eligibility and extract baseline data on nineteen characteristics. First, the yield of screening with automated EHR text-mining search was compared with manual screening by research personnel. Second, the accuracy of extracted baseline data by EHR text mining was compared to manual data entry by research personnel. ResultsOf the 92,466 patients visiting the out-patient cardiology departments, 568 (0.6%) were enrolled in the trial during its recruitment period using manual screening methods. Automated EHR data screening of all patients showed that the number of patients needed to screen could be reduced by 73,863 (79.9%). The remaining 18,603 (20.1%) contained 458 of the actual participants (82.4% of participants).In trial participants, automated EHR text-mining missed a median of 2.8% (Interquartile range [IQR] across all variables 0.4–8.5%) of all data points compared to manually collected data. The overall accuracy of automatically extracted data was 88.0% (IQR 84.7–92.8%). ConclusionAutomatically extracting data from EHRs using text-mining can be used to identify trial participants and to collect baseline information.

Highlights

  • Clinical research requires highly detailed information on large numbers of subjects, often acquired by many investigators and supporting staff

  • This study shows that integral text-mining of electronic healthcare records (EHRs) yields good results for trial participant screening and data-collection

  • Automated EHR data screening resulted in a reduction of 73,863 (79.9%) patients that needed to be screened for trial participation

Read more

Summary

Introduction

Clinical research requires highly detailed information on large numbers of subjects, often acquired by many investigators and supporting staff. Prospective research such as registries and randomized clinical trials (RCT) need to comply with high standards of data validity [1,2]. A major part of these costs is attributable to participant recruitment and follow-up, for a large part comprising data collection [5,6]. Standing practice for clinical trials is that dedicated personnel enters source data in distinct (electronic) clinical report forms (CRFs). This data, is generally already collected in clinical care and available in electronic healthcare records (EHRs), creating overlapping copies of data that are already available (Fig. 1A)

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.