Abstract

BackgroundExtraction of clinical information such as medications or problems from clinical text is an important task of clinical natural language processing (NLP). Rule-based methods are often used in clinical NLP systems because they are easy to adapt and customize. Recently, supervised machine learning methods have proven to be effective in clinical NLP as well. However, combining different classifiers to further improve the performance of clinical entity recognition systems has not been investigated extensively. Combining classifiers into an ensemble classifier presents both challenges and opportunities to improve performance in such NLP tasks.MethodsWe investigated ensemble classifiers that used different voting strategies to combine outputs from three individual classifiers: a rule-based system, a support vector machine (SVM) based system, and a conditional random field (CRF) based system. Three voting methods were proposed and evaluated using the annotated data sets from the 2009 i2b2 NLP challenge: simple majority, local SVM-based voting, and local CRF-based voting.ResultsEvaluation on 268 manually annotated discharge summaries from the i2b2 challenge showed that the local CRF-based voting method achieved the best F-score of 90.84% (94.11% Precision, 87.81% Recall) for 10-fold cross-validation. We then compared our systems with the first-ranked system in the challenge by using the same training and test sets. Our system based on majority voting achieved a better F-score of 89.65% (93.91% Precision, 85.76% Recall) than the previously reported F-score of 89.19% (93.78% Precision, 85.03% Recall) by the first-ranked system in the challenge.ConclusionsOur experimental results using the 2009 i2b2 challenge datasets showed that ensemble classifiers that combine individual classifiers into a voting system could achieve better performance than a single classifier in recognizing medication information from clinical text. It suggests that simple strategies that can be easily implemented such as majority voting could have the potential to significantly improve clinical entity recognition.

Highlights

  • Extraction of clinical information such as medications or problems from clinical text is an important task of clinical natural language processing (NLP)

  • Precision is the ratio between the number of NEs correctly identified by the system and the total number of NEs found by the system; Recall is the ratio between the number of NEs found by the system and the number of NEs in the gold standard; and F-score is the harmonic mean of Precision and Recall

  • The local CRFbased voting system performed significantly better than the single conditional random field (CRF) system for overall, dosage and was significantly less accurate than the single CRF system in recognizing duration

Read more

Summary

Introduction

Extraction of clinical information such as medications or problems from clinical text is an important task of clinical natural language processing (NLP). Rule-based methods are often used in clinical NLP systems because they are easy to adapt and customize. Combining different classifiers to further improve the performance of clinical entity recognition systems has not been investigated extensively. Named Entity Recognition (NER) is an important step in natural language processing (NLP). It has many applications in the general language domain such as identifying person names, locations, and organizations. There are mainly two types of approaches to identifying biomedical entities: rule-based and supervised machine learning based approaches. While rule-based approaches use existing biomedical knowledge/resources, One way to harness the advantages of both these approaches is to combine them into an ensemble classifier [4,6,8]. Zhou et al [8] investigated the combination of three classifiers, including one SVM and two discriminative

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call