Extractive text summarization system to aid data extraction from full text in systematic review development

Duy Duc An Bui,Guilherme Del Fiol,John F Hurdle,Siddhartha Jonnalagadda

doi:10.1016/j.jbi.2016.10.014

Duy Duc An Bui, Guilherme Del Fiol + Show 2 more

Open Access

https://doi.org/10.1016/j.jbi.2016.10.014

Copy DOI

Abstract

ObjectivesExtracting data from publication reports is a standard process in systematic review (SR) development. However, the data extraction process still relies too much on manual effort which is slow, costly, and subject to human error. In this study, we developed a text summarization system aimed at enhancing productivity and reducing errors in the traditional data extraction process. MethodsWe developed a computer system that used machine learning and natural language processing approaches to automatically generate summaries of full-text scientific publications. The summaries at the sentence and fragment levels were evaluated in finding common clinical SR data elements such as sample size, group size, and PICO values. We compared the computer-generated summaries with human written summaries (title and abstract) in terms of the presence of necessary information for the data extraction as presented in the Cochrane review’s study characteristics tables. ResultsAt the sentence level, the computer-generated summaries covered more information than humans do for systematic reviews (recall 91.2% vs. 83.8%, p<0.001). They also had a better density of relevant sentences (precision 59% vs. 39%, p<0.001). At the fragment level, the ensemble approach combining rule-based, concept mapping, and dictionary-based methods performed better than individual methods alone, achieving an 84.7% F-measure. ConclusionComputer-generated summaries are potential alternative information sources for data extraction in systematic review development. Machine learning and natural language processing are promising approaches to the development of such an extractive summarization system.

Full Text