Abstract

Electronic health record (EHR) is a digital data format that collects electronic health information about an individual patient or population. To enhance the meaningful use of EHRs, information extraction techniques have been developed to recognize clinical concepts mentioned in EHRs. Nevertheless, the clinical judgment of an EHR cannot be known solely based on the recognized concepts without considering its contextual information. In order to improve the readability and accessibility of EHRs, this work developed a section heading recognition system for clinical documents. In contrast to formulating the section heading recognition task as a sentence classification problem, this work proposed a token-based formulation with the conditional random field (CRF) model. A standard section heading recognition corpus was compiled by annotators with clinical experience to evaluate the performance and compare it with sentence classification and dictionary-based approaches. The results of the experiments showed that the proposed method achieved a satisfactory F-score of 0.942, which outperformed the sentence-based approach and the best dictionary-based system by 0.087 and 0.096, respectively. One important advantage of our formulation over the sentence-based approach is that it presented an integrated solution without the need to develop additional heuristics rules for isolating the headings from the surrounding section contents.

Highlights

  • Electronic health record (EHR) is a digital data format that collects electronic health information about an individual patient or population

  • According to a study by Capurro [2], approximately 50% of EHR data collected from sources like clinical notes, radiology reports, and discharge summaries is stored as free text

  • The best recall on both datasets was achieved by the dictionary-based method 2 with the section names from the training set and section header terminology (SecTag)

Read more

Summary

Introduction

Electronic health record (EHR) is a digital data format that collects electronic health information about an individual patient or population. According to a study by Capurro [2], approximately 50% of EHR data collected from sources like clinical notes, radiology reports, and discharge summaries is stored as free text. Unstructured format as such makes it difficult to retrieve meaningful information from EHRs. Unstructured format as such makes it difficult to retrieve meaningful information from EHRs In light of this issue, information extraction (IE) techniques have been applied to unstructured parts of EHRs to assist clinical decision support and foster analysis and clinical research [3]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call