Recognizing clinical entities in hospital discharge summaries using Structural Support Vector Machines with word representation features

Buzhou Tang,Hongxin Cao,Yonghui Wu,Hua Xu,Min Jiang

doi:10.1186/1472-6947-13-s1-s1

Abstract

BackgroundNamed entity recognition (NER) is an important task in clinical natural language processing (NLP) research. Machine learning (ML) based NER methods have shown good performance in recognizing entities in clinical text. Algorithms and features are two important factors that largely affect the performance of ML-based NER systems. Conditional Random Fields (CRFs), a sequential labelling algorithm, and Support Vector Machines (SVMs), which is based on large margin theory, are two typical machine learning algorithms that have been widely applied to clinical NER tasks. For features, syntactic and semantic information of context words has often been used in clinical NER systems. However, Structural Support Vector Machines (SSVMs), an algorithm that combines the advantages of both CRFs and SVMs, and word representation features, which contain word-level back-off information over large unlabelled corpus by unsupervised algorithms, have not been extensively investigated for clinical text processing. Therefore, the primary goal of this study is to evaluate the use of SSVMs and word representation features in clinical NER tasks.MethodsIn this study, we developed SSVMs-based NER systems to recognize clinical entities in hospital discharge summaries, using the data set from the concept extration task in the 2010 i2b2 NLP challenge. We compared the performance of CRFs and SSVMs-based NER classifiers with the same feature sets. Furthermore, we extracted two different types of word representation features (clustering-based representation features and distributional representation features) and integrated them with the SSVMs-based clinical NER system. We then reported the performance of SSVM-based NER systems with different types of word representation features.Results and discussionUsing the same training (N = 27,837) and test (N = 45,009) sets in the challenge, our evaluation showed that the SSVMs-based NER systems achieved better performance than the CRFs-based systems for clinical entity recognition, when same features were used. Both types of word representation features (clustering-based and distributional representations) improved the performance of ML-based NER systems. By combining two different types of word representation features together with SSVMs, our system achieved a highest F-measure of 85.82%, which outperformed the best system reported in the challenge by 0.6%. Our results show that SSVMs is a great potential algorithm for clinical NLP research, and both types of unsupervised word representation features are beneficial to clinical NER tasks.

Highlights

Named entity recognition (NER) is an important task in clinical natural language processing (NLP) research
In our previous work presented in the ACM sixth international workshop on Data and text mining in biomedical informatics (DTMBIO’12) [28], we explored the uses of Structural Support Vector Machines (SSVMs), combined features, clustering-based word representation features and tag representations for clinical entity recognition
When both types of word representation features were combined with SSVMs, our system achieved a highest F-measure of 85.82%, an improvement of 0.4% to the baseline system, which outperformed the best system reported in the challenge by 0.6%

Summary

Introduction

Named entity recognition (NER) is an important task in clinical natural language processing (NLP) research. Machine learning (ML) based NER methods have shown good performance in recognizing entities in clinical text. Structural Support Vector Machines (SSVMs), an algorithm that combines the advantages of both CRFs and SVMs, and word representation features, which contain word-level back-off information over large unlabelled corpus by unsupervised algorithms, have not been extensively investigated for clinical text processing. Natural language processing (NLP) technologies, which can extract structured clinical information from narrative text, have been introduced to the medical domain for more than a decade [1]. Named Entity Recognition (NER), which is to identify boundary and to determine semantic classes (e.g., person names, locations, or organizations) of words/phrases in free text, is an important task in NLP research. Many participating teams, including all top five systems (with F-measures ranging from 81.3% to 85.2%), were primarily based on machine learning approaches [16,17,18]

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Medical Informatics and Decision Making	Publication Date: Apr 1, 2013
Citations: 140	License type: cc-by

R Discovery Prime

R Discovery Prime

Recognizing clinical entities in hospital discharge summaries using Structural Support Vector Machines with word representation features

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Medical Informatics and Decision Making

Lead the way for us

Similar Papers

Clinical entity recognition using structural support vector machines with rich features
Buzhou Tang ... Min Jiang
-
Buzhou Tang, et. al.Buzhou Tang ... Min Jiang
29 Oct 2012
29 Oct 2012

A comprehensive study of named entity recognition in Chinese clinical text
J Lei ... M Jiang
Journal of the American Medical Informatics Association | VOL. 21
J Lei, et. al.J Lei ... M Jiang
17 Dec 2013
Journal of the American Medical Informatics Association | VOL. 21

A comparison of conditional random fields and structured support vector machines for chemical entity recognition in biomedical literature.
Buzhou Tang ... Yonghui Wu
Journal of Cheminformatics | VOL. 7
Buzhou Tang, et. al.Buzhou Tang ... Yonghui Wu
19 Jan 2015
Journal of Cheminformatics | VOL. 7

Recognition of medication information from discharge summaries using ensembles of classifiers
Son Doan ... Pham Hoang Duy
BMC Medical Informatics and Decision Making | VOL. 12
Son Doan, et. al.Son Doan ... Pham Hoang Duy
07 May 2012
BMC Medical Informatics and Decision Making | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Recognizing clinical entities in hospital discharge summaries using Structural Support Vector Machines with word representation features

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Medical Informatics and Decision Making