Extracting laboratory test information from paper-based reports

Ming-Wei Ma,Xian-Shu Gao,Ze-Yu Zhang,Shi-Yu Shang,Ling Jin,Pei-Lin Liu,Feng Lv,Wei Ni,Yu-Chen Han,Hui Zong

doi:10.1186/s12911-023-02346-6

Abstract

BackgroundIn the healthcare domain today, despite the substantial adoption of electronic health information systems, a significant proportion of medical reports still exist in paper-based formats. As a result, there is a significant demand for the digitization of information from these paper-based reports. However, the digitization of paper-based laboratory reports into a structured data format can be challenging due to their non-standard layouts, which includes various data types such as text, numeric values, reference ranges, and units. Therefore, it is crucial to develop a highly scalable and lightweight technique that can effectively identify and extract information from laboratory test reports and convert them into a structured data format for downstream tasks.MethodsWe developed an end-to-end Natural Language Processing (NLP)-based pipeline for extracting information from paper-based laboratory test reports. Our pipeline consists of two main modules: an optical character recognition (OCR) module and an information extraction (IE) module. The OCR module is applied to locate and identify text from scanned laboratory test reports using state-of-the-art OCR algorithms. The IE module is then used to extract meaningful information from the OCR results to form digitalized tables of the test reports. The IE module consists of five sub-modules, which are time detection, headline position, line normalization, Named Entity Recognition (NER) with a Conditional Random Fields (CRF)-based method, and step detection for multi-column. Finally, we evaluated the performance of the proposed pipeline on 153 laboratory test reports collected from Peking University First Hospital (PKU1).ResultsIn the OCR module, we evaluate the accuracy of text detection and recognition results at three different levels and achieved an averaged accuracy of 0.93. In the IE module, we extracted four laboratory test entities, including test item name, test result, test unit, and reference value range. The overall F1 score is 0.86 on the 153 laboratory test reports collected from PKU1. With a single CPU, the average inference time of each report is only 0.78 s.ConclusionIn this study, we developed a practical lightweight pipeline to digitalize and extract information from paper-based laboratory test reports in diverse types and with different layouts that can be adopted in real clinical environments with the lowest possible computing resources requirements. The high evaluation performance on the real-world hospital dataset validated the feasibility of the proposed pipeline.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Medical Informatics and Decision Making	Publication Date: Nov 6, 2023
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Extracting laboratory test information from paper-based reports

Abstract

Talk to us

Similar Papers

More From: BMC Medical Informatics and Decision Making

Lead the way for us

Similar Papers

Information Extraction for Tracking Liver Cancer Patients' Statuses: From Mixture of Clinical Narrative Report Types
Xiao-Ou Ping ... Ja-Der Liang
Telemedicine and e-Health | VOL. 19
Xiao-Ou Ping, et. al.Xiao-Ou Ping ... Ja-Der Liang
20 Jul 2013
Telemedicine and e-Health | VOL. 19

Kulak Burun Boğaz Taburcu Notlarından Birliktelik Kurallarının Çıkartılması
Başak Oğuz Yolcular ... Uğur Bi̇lge
Bilişim Teknolojileri Dergisi | VOL. 11
Başak Oğuz Yolcular, et. al.Başak Oğuz Yolcular ... Uğur Bi̇lge
31 Jan 2018
Bilişim Teknolojileri Dergisi | VOL. 11

APIE: An information extraction module designed based on the pipeline method
Xu Jiang ... Baoquan Ma
Array | VOL. 21
Xu Jiang, et. al.Xu Jiang ... Baoquan Ma
01 Dec 2023
Array | VOL. 21

Digital Image Text Recognition Using Machine Learning Algorithms
Chaitanya U ... Harsitha Ballam
International Journal for Research in Applied Science and Engineering Technology | VOL. 11
Chaitanya U, et. al.Chaitanya U ... Harsitha Ballam
30 Jun 2023
International Journal for Research in Applied Science and Engineering Technology | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Extracting laboratory test information from paper-based reports

Abstract

Talk to us

Similar Papers

More From: BMC Medical Informatics and Decision Making