Assessing the Performance of Clinical Natural Language Processing Systems: Development of an Evaluation Methodology.

Lea Canales,Sebastian Menke,Miren Taberna,Stephanie Marchesseau,Carlos Del Rio-Bermudez,Ariel D’Agostino,Jorge Tello

doi:10.2196/20492

Abstract

BackgroundClinical natural language processing (cNLP) systems are of crucial importance due to their increasing capability in extracting clinically important information from free text contained in electronic health records (EHRs). The conversion of a nonstructured representation of a patient’s clinical history into a structured format enables medical doctors to generate clinical knowledge at a level that was not possible before. Finally, the interpretation of the insights gained provided by cNLP systems has a great potential in driving decisions about clinical practice. However, carrying out robust evaluations of those cNLP systems is a complex task that is hindered by a lack of standard guidance on how to systematically approach them.ObjectiveOur objective was to offer natural language processing (NLP) experts a methodology for the evaluation of cNLP systems to assist them in carrying out this task. By following the proposed phases, the robustness and representativeness of the performance metrics of their own cNLP systems can be assured.MethodsThe proposed evaluation methodology comprised five phases: (1) the definition of the target population, (2) the statistical document collection, (3) the design of the annotation guidelines and annotation project, (4) the external annotations, and (5) the cNLP system performance evaluation. We presented the application of all phases to evaluate the performance of a cNLP system called “EHRead Technology” (developed by Savana, an international medical company), applied in a study on patients with asthma. As part of the evaluation methodology, we introduced the Sample Size Calculator for Evaluations (SLiCE), a software tool that calculates the number of documents needed to achieve a statistically useful and resourceful gold standard.ResultsThe application of the proposed evaluation methodology on a real use-case study of patients with asthma revealed the benefit of the different phases for cNLP system evaluations. By using SLiCE to adjust the number of documents needed, a meaningful and resourceful gold standard was created. In the presented use-case, using as little as 519 EHRs, it was possible to evaluate the performance of the cNLP system and obtain performance metrics for the primary variable within the expected CIs.ConclusionsWe showed that our evaluation methodology can offer guidance to NLP experts on how to approach the evaluation of their cNLP systems. By following the five phases, NLP experts can assure the robustness of their evaluation and avoid unnecessary investment of human and financial resources. Besides the theoretical guidance, we offer SLiCE as an easy-to-use, open-source Python library.

Highlights

Over the last decades, health care institutions have increasingly abandoned clinical records in paper form and have started to store patients’ longitudinal medical information in electronic health records (EHRs)
Our evaluation methodology is a set of methods and principles used to perform a Clinical natural language processing (cNLP) system evaluation, which extends from the establishment of the reference standard to the measurement and presentation of the evaluation metrics
Application of the Methodology The proposed evaluation methodology has been applied for the evaluation of cNLP systems in several clinical research projects at Savana

Summary

Introduction

Health care institutions have increasingly abandoned clinical records in paper form and have started to store patients’ longitudinal medical information in electronic health records (EHRs). The importance and complexity of improving cNLP systems has given rise to a strong engagement among researchers in developing methods capable of doing so [10,11,12,13,14,15,16] This resulted in improved cNLP systems that have dramatically changed the scale at which information contained in the free-text portion of EHRs can be utilized [17,18,19,20] and has provided valuable insights into clinical populations [21,22,23,24,25,26,27], epidemiology trends [28,29,30], patient management [31], pharmacovigilance [32], and optimization of hospital resources [33]. We offer SLiCE as an easy-to-use, open-source Python library

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: JMIR Medical Informatics	Publication Date: Jul 23, 2021
Citations: 44	License type: cc-by

R Discovery Prime

R Discovery Prime

Assessing the Performance of Clinical Natural Language Processing Systems: Development of an Evaluation Methodology.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: JMIR Medical Informatics

Lead the way for us

Similar Papers

Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review
Kory Kreimeyer ... Taxiarchis Botsis
Journal of Biomedical Informatics | VOL. 73
Kory Kreimeyer, et. al.Kory Kreimeyer ... Taxiarchis Botsis
17 Jul 2017
Journal of Biomedical Informatics | VOL. 73

Extracting medications and associated adverse drug events using a natural language processing system combining knowledge base and deep learning.
Long Chen ... Yu Gu
Journal of the American Medical Informatics Association | VOL. 27
Long Chen, et. al.Long Chen ... Yu Gu
07 Oct 2019
Journal of the American Medical Informatics Association | VOL. 27

Challenges in adapting existing clinical natural language processing systems to multiple, diverse health care settings
David S Carrell ... Sherri Rose
Journal of the American Medical Informatics Association | VOL. 24
David S Carrell, et. al.David S Carrell ... Sherri Rose
17 Apr 2017
Journal of the American Medical Informatics Association | VOL. 24

Clinical concept normalization with a hybrid natural language processing system combining multilevel matching and machine learning ranking.
Long Chen ... Yang Huang
Journal of the American Medical Informatics Association : JAMIA | VOL. 27
Long Chen, et. al.Long Chen ... Yang Huang
01 Oct 2020
Journal of the American Medical Informatics Association : JAMIA | VOL. 27

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Assessing the Performance of Clinical Natural Language Processing Systems: Development of an Evaluation Methodology.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: JMIR Medical Informatics