Effective Information Extraction Framework for Heterogeneous Clinical Reports Using Online Machine Learning and Controlled Vocabularies.

Shuai Zheng,Arshed A Quyyumi,Salim S Hayek,Nima Ghasemzadeh,Fusheng Wang,James J Lu

doi:10.2196/medinform.7235

Shuai Zheng, Arshed A Quyyumi + Show 4 more

Open Access

https://doi.org/10.2196/medinform.7235

Copy DOI

Journal: JMIR medical informatics	Publication Date: May 9, 2017
Citations: 25	License type: cc-by

Affiliation: Emory University

Abstract

BackgroundExtracting structured data from narrated medical reports is challenged by the complexity of heterogeneous structures and vocabularies and often requires significant manual effort. Traditional machine-based approaches lack the capability to take user feedbacks for improving the extraction algorithm in real time.ObjectiveOur goal was to provide a generic information extraction framework that can support diverse clinical reports and enables a dynamic interaction between a human and a machine that produces highly accurate results.MethodsA clinical information extraction system IDEAL-X has been built on top of online machine learning. It processes one document at a time, and user interactions are recorded as feedbacks to update the learning model in real time. The updated model is used to predict values for extraction in subsequent documents. Once prediction accuracy reaches a user-acceptable threshold, the remaining documents may be batch processed. A customizable controlled vocabulary may be used to support extraction.ResultsThree datasets were used for experiments based on report styles: 100 cardiac catheterization procedure reports, 100 coronary angiographic reports, and 100 integrated reports—each combines history and physical report, discharge summary, outpatient clinic notes, outpatient clinic letter, and inpatient discharge medication report. Data extraction was performed by 3 methods: online machine learning, controlled vocabularies, and a combination of these. The system delivers results with F1 scores greater than 95%.ConclusionsIDEAL-X adopts a unique online machine learning–based approach combined with controlled vocabularies to support data extraction for clinical reports. The system can quickly learn and improve, thus it is highly adaptable.

Highlights

While immense efforts have been made to enable structured data model for electronic medical record (EMR), a large amount of medical data remain in free-form narrative text, and useful data from individual patients are usually distributed across multiple reports of heterogeneous structures and vocabularies
Our goal is to provide a generic information extraction framework that is adaptable to diverse clinical reports, enables a dynamic interaction between a human and a machine, and produces highly accurate results with minimal human effort
We have developed a system, Information and Data Extraction using Adaptive Online Learning (IDEAL-X), to support adaptive information extraction from diverse clinical reports with heterogeneous structures and vocabularies

Summary

Introduction

While immense efforts have been made to enable structured data model for electronic medical record (EMR), a large amount of medical data remain in free-form narrative text, and useful data from individual patients are usually distributed across multiple reports of heterogeneous structures and vocabularies This poses major challenges to traditional information extraction systems, as either costly training datasets or manually crafted rules have to be prepared. Our goal is to provide a generic information extraction framework that is adaptable to diverse clinical reports, enables a dynamic interaction between a human and a machine, and produces highly accurate results with minimal human effort. Methods: A clinical information extraction system IDEAL-X has been built on top of online machine learning It processes one document at a time, and user interactions are recorded as feedbacks to update the learning model in real time. This may limit its ability to extract certain value types

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Effective Information Extraction Framework for Heterogeneous Clinical Reports Using Online Machine Learning and Controlled Vocabularies.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: JMIR medical informatics

Lead the way for us

Similar Papers

Information extraction framework for Kurunthogai
C N Subalalitha
Sādhanā | VOL. 44
C N SubalalithaC N Subalalitha
05 Jun 2019
Sādhanā | VOL. 44

Ocean surface wind prediction Based on GA-LIESN On-line Learning Model
Zhiying Lu ... Jingchun Yan
-
Zhiying Lu, et. al.Zhiying Lu ... Jingchun Yan
01 Jul 2020
01 Jul 2020

Ontology-based Sequence Labelling for Automated Information Extraction for Supporting Bridge Data Analytics
Kaijian Liu ... Nora El-Gohary
Procedia Engineering | VOL. 145
Kaijian Liu, et. al.Kaijian Liu ... Nora El-Gohary
01 Jan 2015
Procedia Engineering | VOL. 145

Knowledge transfer in lifelong machine learning: a systematic literature review
Pouya Khodaee ... Wojtek Michalowski
Artificial Intelligence Review | VOL. 57
Pouya Khodaee, et. al.Pouya Khodaee ... Wojtek Michalowski
26 Jul 2024
Artificial Intelligence Review | VOL. 57

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Effective Information Extraction Framework for Heterogeneous Clinical Reports Using Online Machine Learning and Controlled Vocabularies.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: JMIR medical informatics