Automated SNOMED CT concept and attribute relationship detection through a web-based implementation of cTAKES

Martijn G Kersloot,Ronald Cornet,Ameen Abu-Hanna,Derk L Arts,Francis Lau

doi:10.1186/s13326-019-0207-3

Abstract

BackgroundInformation in Electronic Health Records is largely stored as unstructured free text. Natural language processing (NLP), or Medical Language Processing (MLP) in medicine, aims at extracting structured information from free text, and is less expensive and time-consuming than manual extraction. However, most algorithms in MLP are institution-specific or address only one clinical need, and thus cannot be broadly applied. In addition, most MLP systems do not detect concepts in misspelled text and cannot detect attribute relationships between concepts. The objective of this study was to develop and evaluate an MLP application that includes generic algorithms for the detection of (misspelled) concepts and of attribute relationships between them.MethodsAn implementation of the MLP system cTAKES, called DIRECT, was developed with generic SNOMED CT concept filter, concept relationship detection, and attribute relationship detection algorithms and a custom dictionary. Four implementations of cTAKES were evaluated by comparing 98 manually annotated oncology charts with the output of DIRECT. The F1-score was determined for named-entity recognition and attribute relationship detection for the concepts ‘lung cancer’, ‘non-small cell lung cancer’, and ‘recurrence’. The performance of the four implementations was compared with a two-tailed permutation test.ResultsDIRECT detected lung cancer and non-small cell lung cancer concepts with F1-scores between 0.828 and 0.947 and between 0.862 and 0.933, respectively. The concept recurrence was detected with a significantly higher F1-score of 0.921, compared to the other implementations, and the relationship between recurrence and lung cancer with an F1-score of 0.857. The precision of the detection of lung cancer, non-small cell lung cancer, and recurrence concepts were 1.000, 0.966, and 0.879, compared to precisions of 0.943, 0.967, and 0.000 in the original implementation, respectively.ConclusionDIRECT can detect oncology concepts and attribute relationships with high precision and can detect recurrence with significant increase in F1-score, compared to the original implementation of cTAKES, due to the usage of a custom dictionary and a generic concept relationship detection algorithm. These concepts and relationships can be used to encode clinical narratives, and can thus substantially reduce manual chart abstraction efforts, saving time for clinicians and researchers.

Highlights

Information in Electronic Health Records is largely stored as unstructured free text
Much of the data present in Electronic Health Records (EHRs) are stored as unstructured free text [1] as clinicians often resort to making free-text notes, despite available coding options [2]
The use of free text should be taken into account when EHR data are reused for other purposes [3], since data reuse for research and development of clinical decision support tools can improve healthcare [4]

Summary

Introduction

Information in Electronic Health Records is largely stored as unstructured free text. Natural language processing (NLP), or Medical Language Processing (MLP) in medicine, aims at extracting structured information from free text, and is less expensive and time-consuming than manual extraction. One of the tasks of natural language processing (NLP) methods, named-entity recognition, aims to extract structured information from free text that is less expensive and time-consuming than extracting it manually [6]. MLP has been proven successful in extracting diagnoses from free-text notes from the EHR, thereby reducing manual chart abstraction efforts. It can, for example, be used to automatically detect the recurrence of breast cancer in patient charts, reducing the number of manually reviewed charts by 90% [11]. Other research shows that MLP can identify uncodified diabetes cases, leading to a more complete ascertainment of diagnoses and, better information provision and targeted care for patients [12]

Objectives

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Biomedical Semantics	Publication Date: Sep 18, 2019
Citations: 6	License type: open-access

R Discovery Prime

R Discovery Prime

Automated SNOMED CT concept and attribute relationship detection through a web-based implementation of cTAKES

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Biomedical Semantics

Lead the way for us

Similar Papers

Commentary: Postrecurrence survival in patients with lung cancer after curative surgery warrants systematic investigation to optimize management strategies
Chi Sum Yuen ... Michael Hsin
JTCVS open | VOL. 10
Chi Sum Yuen, et. al.Chi Sum Yuen ... Michael Hsin
21 Apr 2022
JTCVS open | VOL. 10

Automating Access to Real-World Evidence
Marie-Pier Gauthier ... Natasha B Leighl
JTO Clinical and Research Reports | VOL. 3
Marie-Pier Gauthier, et. al.Marie-Pier Gauthier ... Natasha B Leighl
17 May 2022
JTO Clinical and Research Reports | VOL. 3

Targeting MET Exon 14 Skipping Alterations: Has Lung Cancer MET Its Match?
Timothy A Yap ... Sanjay Popat
Journal of Thoracic Oncology | VOL. 12
Timothy A Yap, et. al.Timothy A Yap ... Sanjay Popat
01 Jan 2017
Journal of Thoracic Oncology | VOL. 12

Application of serum SELDI proteomic patterns in diagnosis of lung cancer
Shuan-Ying Yang ... Da-Cheng He
BMC Cancer | VOL. 5
Shuan-Ying Yang, et. al.Shuan-Ying Yang ... Da-Cheng He
20 Jul 2005
BMC Cancer | VOL. 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automated SNOMED CT concept and attribute relationship detection through a web-based implementation of cTAKES

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Biomedical Semantics