Abstract

BackgroundInformation extraction techniques that get structured representations out of unstructured data make a large amount of clinically relevant information about patients accessible for semantic applications. These methods typically rely on standardized terminologies that guide this process. Many languages and clinical domains, however, lack appropriate resources and tools, as well as evaluations of their applications, especially if detailed conceptualizations of the domain are required. For instance, German transthoracic echocardiography reports have not been targeted sufficiently before, despite of their importance for clinical trials. This work therefore aimed at development and evaluation of an information extraction component with a fine-grained terminology that enables to recognize almost all relevant information stated in German transthoracic echocardiography reports at the University Hospital of Würzburg.MethodsA domain expert validated and iteratively refined an automatically inferred base terminology. The terminology was used by an ontology-driven information extraction system that outputs attribute value pairs. The final component has been mapped to the central elements of a standardized terminology, and it has been evaluated according to documents with different layouts.ResultsThe final system achieved state-of-the-art precision (micro average.996) and recall (micro average.961) on 100 test documents that represent more than 90 % of all reports. In particular, principal aspects as defined in a standardized external terminology were recognized with f1=.989 (micro average) and f1=.963 (macro average). As a result of keyword matching and restraint concept extraction, the system obtained high precision also on unstructured or exceptionally short documents, and documents with uncommon layout.ConclusionsThe developed terminology and the proposed information extraction system allow to extract fine-grained information from German semi-structured transthoracic echocardiography reports with very high precision and high recall on the majority of documents at the University Hospital of Würzburg. Extracted results populate a clinical data warehouse which supports clinical research.Electronic supplementary materialThe online version of this article (doi:10.1186/s12911-015-0215-x) contains supplementary material, which is available to authorized users.

Highlights

  • Information extraction techniques that get structured representations out of unstructured data make a large amount of clinically relevant information about patients accessible for semantic applications

  • We address information extraction from German transthoracic echocardiography (TTE) reports with a broad coverage of relevant concepts

  • Concepts: number of classes, concepts or terminology used for reported results. aconcept level analysis, see related work for details. bnamed entity recognition results used as an upper estimate; see original work for more detailed figures. capplication uses standardized resources such as UMLS or ICD-O with a large number of concepts. domitted to reflect that precision and recall have been evaluated on different sets of sentences. eSentence-level classification of normal vs. pathological findings depicted in rows three and four of Table 1, the results reported for their system show that rule-based information extraction performs well on clinical subdomains

Read more

Summary

Introduction

Information extraction techniques that get structured representations out of unstructured data make a large amount of clinically relevant information about patients accessible for semantic applications. These methods typically rely on standardized terminologies that guide this process. This work aimed at development and evaluation of an information extraction component with a fine-grained terminology that enables to recognize almost all relevant information stated in German transthoracic echocardiography reports at the University Hospital of Würzburg. Information extraction in the clinical domain aims to translate textual reports into structured representations It enables semantic information retrieval, the application of formal knowledge to patient management, and further data analysis like clinical research based on statistics and evidence based medicine. Clinical terminology extraction and ontology learning are active areas of research, especially for nonEnglish research groups like, for example, Marciniak et al [17], to overcome this problem

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call