Abstract

In this article, we describe the development of annotation guidelines for family history information in Norwegian clinical text. We make use of incrementally developed synthetic clinical text describing patients’ family history relating to cases of cardiac disease and present a general methodology which integrates the synthetically produced clinical statements and guideline development. We analyze inter-annotator agreement based on the developed guidelines and present results from experiments aimed at evaluating the validity and applicability of the annotated corpus using machine learning techniques. The resulting annotated corpus contains 477 sentences and 6030 tokens. Both the annotation guidelines and the annotated corpus are made freely available and as such constitutes the first publicly available resource of Norwegian clinical text.

Highlights

  • The limited availability of clinical text corpora constitutes a major challenge for the development of clinical NLP tools. Such text originates in the health record (EHR), and access to and use of the EHR is governed by strict data privacy and health service regulations, which usually restricts secondary use and prohibits re-distribution and sharing with the larger NLP community

  • This article describes the systematic development of annotation guidelines for family history information in Norwegian clinical text

  • Due to the unavailability of the real health records describing family histories, we developed a methodology for annotation guideline development which makes use of an incrementally developed synthetic corpus

Read more

Summary

Introduction

The limited availability of clinical text corpora constitutes a major challenge for the development of clinical NLP tools. Development of annotation guidelines is a time consuming process which in the case of clinical data often requires access to domain experts (clinicians). This article describes the systematic development of annotation guidelines for family history information in Norwegian clinical text. We make use of incrementally developed synthetic clinical text describing patients’ family history relating to cases of cardiac diseases. The domain expert is an integral part of this methodology and generates synthetic examples that challenge the guidelines and further participates both in the annotation and development of guidelines. Ysis (Hiekkalinna et al, 2005)

Previous work
Clinical entities
Span of annotations
Entity detection
Relation extraction
Findings
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.