Abstract

BackgroundNatural language processing (NLP) tools can facilitate the extraction of biomedical concepts from unstructured free texts, such as research articles or clinical notes. The NLP software tools CLAMP, cTAKES, and MetaMap are among the most widely used tools to extract biomedical concept entities. However, their performance in extracting disease-specific terminology from literature has not been compared extensively, especially for complex neuropsychiatric disorders with a diverse set of phenotypic and clinical manifestations.MethodsWe comparatively evaluated these NLP tools using autism spectrum disorder (ASD) as a case study. We collected 827 ASD-related terms based on previous literature as the benchmark list for performance evaluation. Then, we applied CLAMP, cTAKES, and MetaMap on 544 full-text articles and 20,408 abstracts from PubMed to extract ASD-related terms. We evaluated the predictive performance using precision, recall, and F1 score.ResultsWe found that CLAMP has the best performance in terms of F1 score followed by cTAKES and then MetaMap. Our results show that CLAMP has much higher precision than cTAKES and MetaMap, while cTAKES and MetaMap have higher recall than CLAMP.ConclusionThe analysis protocols used in this study can be applied to other neuropsychiatric or neurodevelopmental disorders that lack well-defined terminology sets to describe their phenotypic presentations.

Highlights

  • Natural language processing (NLP) tools can facilitate the extraction of biomedical concepts from unstructured free texts, such as research articles or clinical notes

  • MetaMap was published in 2001 and is considered the foundational biomedical information extraction tool developed by the National Library of Medicine. cTAKES was later developed by Mayo Clinic in 2010 and included more NLP functional modules to process clinical notes using rule-based and machine learning-based approaches

  • We found that CLAMP has the best performance in terms of F1 score, followed by cTAKES, and MetaMap, for both the baseline result and when filtering the predicted entities by Unified Medical Language System (UMLS) semantic type and removing comorbid psychiatric disorders

Read more

Summary

Introduction

Natural language processing (NLP) tools can facilitate the extraction of biomedical concepts from unstructured free texts, such as research articles or clinical notes. The NLP software tools CLAMP, cTAKES, and MetaMap are among the most widely used tools to extract biomedical concept entities. Their performance in extracting disease-specific terminology from literature has not been compared extensively, especially for complex neuropsychiatric disorders with a diverse set of phenotypic and clinical manifestations. The extraction of biomedical concepts and entities, such as genes, drugs, and symptoms, is one of the initial steps for many natural language processing (NLP) analyses. It constitutes a named-entity recognition (NER) task tailored to the biomedical domain. Compared to the other two, the recently developed NLP tool CLAMP has a greater emphasis on flexibility in the development of customized pipeline tasks with diverse options for information extraction

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call