Abstract

BackgroundKnowledge representation frameworks are essential to the understanding of complex biomedical processes, and to the analysis of biomedical texts that describe them. Combined with natural language processing (NLP), they have the potential to contribute to retrospective studies by unlocking important phenotyping information contained in the narrative content of electronic health records (EHRs). This work aims to develop an extensive information representation scheme for clinical information contained in EHR narratives, and to support secondary use of EHR narrative data to answer clinical questions.MethodsWe review recent work that proposed information representation schemes and applied them to the analysis of clinical narratives. We then propose a unifying scheme that supports the extraction of information to address a large variety of clinical questions.ResultsWe devised a new information representation scheme for clinical narratives that comprises 13 entities, 11 attributes and 37 relations. The associated annotation guidelines can be used to consistently apply the scheme to clinical narratives and are https://cabernet.limsi.fr/annotation_guide_for_the_merlot_french_clinical_corpus-Sept2016.pdf.ConclusionThe information scheme includes many elements of the major schemes described in the clinical natural language processing literature, as well as a uniquely detailed set of relations.

Highlights

  • The progressive adoption of electronic health records (EHRs) is paving the way towards making available large amounts of data for research

  • Natural language processing is essential to phenotyping EHR data because of the amount of clinical information buried in the narrative content

  • We aim to support information extraction from clinical narratives in order to answer clinical questions such as: “What is the prevalence of incidental findings in patients with suspected thromboembolic disease?”, “What is the contribution of CT venography in the diagnosis of thromboembolic disease?” or “What are the types and grades of toxicities experienced by colon cancer patients receiving FOLFOX therapy?”

Read more

Summary

Introduction

The progressive adoption of electronic health records (EHRs) is paving the way towards making available large amounts of data for research. Natural language processing is essential to phenotyping EHR data because of the amount of clinical information buried in the narrative content. Combined with natural language processing (NLP), they have the potential to contribute to retrospective studies by unlocking important phenotyping information contained in the narrative content of electronic health records (EHRs). De-identification is usually performed by removing or replacing Personal Health Identifiers with surrogates [14] This is one of the reasons why clinical corpora are less available than corpora in the biological domain [15, 16]. Improvements in clinical information processing have been reported by adopting adequate annotation frameworks [11, 15, 17,18,19] These have been developed in two levels of representation. Most annotation efforts in the biomedical NLP community have followed this trend, especially within the organisation of research challenges

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call