Use of Natural Language Processing for Precise Retrieval of Key Elements of Health IT Evaluation Studies.

Verena Dornauer,Konrad Hoeffner,Franziska Jahn,Alfred Winter,Elske Ammenwerth

doi:10.3233/shti200502

Abstract

Having precise information about health IT evaluation studies is important for evidence-based decisions in medical informatics. In a former feasibility study, we used a faceted search based on ontological modeling of key elements of studies to retrieve precisely described health IT evaluation studies. However, extracting the key elements manually for the modeling of the ontology was time and resource-intensive. We now aimed at applying natural language processing to substitute manual data extraction by automatic data extraction. Four methods (Named Entity Recognition, Bag-of-Words, Term-Frequency-Inverse-Document-Frequency, and Latent Dirichlet Allocation Topic Modeling were applied to 24 health IT evaluation studies. We evaluated which of these methods was best suited for extracting key elements of each study. As gold standard, we used results from manual extraction. As a result, Named Entity Recognition is promising but needs to be adapted to the existing study context. After the adaption, key elements of studies could be collected in a more feasible, time- and resource-saving way.

Full Text