Creating Efficiencies in the Extraction of Data From Randomized Trials: A Prospective Evaluation of a Machine Learning and Text Mining Tool

Allison Gates,Sarah A Elliott,Shannon Sim,Lisa Hartling,Michelle Gates,Jennifer Pillay

doi:10.23970/ahrqepcmethodscreatingefficiencies

Abstract

Background. Machine learning tools that semi-automate data extraction may create efficiencies in systematic review production. We prospectively evaluated an online machine learning and text mining tool’s ability to (a) automatically extract data elements from randomized trials, and (b) save time compared with manual extraction and verification. Methods. For 75 randomized trials published in 2017, we manually extracted and verified data for 21 unique data elements. We uploaded the randomized trials to ExaCT, an online machine learning and text mining tool, and quantified performance by evaluating the tool’s ability to identify the reporting of data elements (reported or not reported), and the relevance of the extracted sentences, fragments, and overall solutions. For each randomized trial, we measured the time to complete manual extraction and verification, and to review and amend the data extracted by ExaCT (simulating semi-automated data extraction). We summarized the relevance of the extractions for each data element using counts and proportions, and calculated the median and interquartile range (IQR) across data elements. We calculated the median (IQR) time for manual and semiautomated data extraction, and overall time savings. Results. The tool identified the reporting (reported or not reported) of data elements with median (IQR) 91 percent (75% to 99%) accuracy. Performance was perfect for four data elements: eligibility criteria, enrolment end date, control arm, and primary outcome(s). Among the top five sentences for each data element at least one sentence was relevant in a median (IQR) 88 percent (83% to 99%) of cases. Performance was perfect for four data elements: funding number, registration number, enrolment start date, and route of administration. Among a median (IQR) 90 percent (86% to 96%) of relevant sentences, pertinent fragments had been highlighted by the system; exact matches were unreliable (median (IQR) 52 percent [32% to 73%]). A median 48 percent of solutions were fully correct, but performance varied greatly across data elements (IQR 21% to 71%). Using ExaCT to assist the first reviewer resulted in a modest time savings compared with manual extraction by a single reviewer (17.9 vs. 21.6 hours total extraction time across 75 randomized trials). Conclusions. Using ExaCT to assist with data extraction resulted in modest gains in efficiency compared with manual extraction. The tool was reliable for identifying the reporting of most data elements. The tool’s ability to identify at least one relevant sentence and highlight pertinent fragments was generally good, but changes to sentence selection and/or highlighting were often required.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Creating Efficiencies in the Extraction of Data From Randomized Trials: A Prospective Evaluation of a Machine Learning and Text Mining Tool

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Creating efficiencies in the extraction of data from randomized trials: a prospective evaluation of a machine learning and text mining tool
Allison Gates ... Sarah A Elliott
BMC Medical Research Methodology | VOL. 21
Allison Gates, et. al.Allison Gates ... Sarah A Elliott
16 Aug 2021
BMC Medical Research Methodology | VOL. 21

Data for: Creating efficiencies in the extraction of data from randomized trials: A prospective evaluation of a machine learning and text mining tool

-

09 Nov 2020
09 Nov 2020

Explainable machine learning practices: opening another black box for reliable medical AI
Emanuele Ratti ... Mark Graves
AI and Ethics | VOL. 2
Emanuele Ratti, et. al.Emanuele Ratti ... Mark Graves
15 Feb 2022
AI and Ethics | VOL. 2

Detecting ADRD Caregivers’ Information Wants in Social Media: A Machine Learning–Aided Approach
Bo Xie ... Zhendong Wang
Innovation in Aging | VOL. 4
Bo Xie, et. al.Bo Xie ... Zhendong Wang
16 Dec 2020
Innovation in Aging | VOL. 4

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Creating Efficiencies in the Extraction of Data From Randomized Trials: A Prospective Evaluation of a Machine Learning and Text Mining Tool

Abstract

Talk to us

Similar Papers