A UMLS-based spell checker for natural language processing in vaccine safety.

Herman D Tolentino,Barbara Law,Paul Fontelo,Wesley Tong,Wikke Walop,Katrin Kohl,Daniel C Payne,Michael D Matters,Fang Liu

doi:10.1186/1472-6947-7-3

Abstract

BackgroundThe Institute of Medicine has identified patient safety as a key goal for health care in the United States. Detecting vaccine adverse events is an important public health activity that contributes to patient safety. Reports about adverse events following immunization (AEFI) from surveillance systems contain free-text components that can be analyzed using natural language processing. To extract Unified Medical Language System (UMLS) concepts from free text and classify AEFI reports based on concepts they contain, we first needed to clean the text by expanding abbreviations and shortcuts and correcting spelling errors. Our objective in this paper was to create a UMLS-based spelling error correction tool as a first step in the natural language processing (NLP) pipeline for AEFI reports.MethodsWe developed spell checking algorithms using open source tools. We used de-identified AEFI surveillance reports to create free-text data sets for analysis. After expansion of abbreviated clinical terms and shortcuts, we performed spelling correction in four steps: (1) error detection, (2) word list generation, (3) word list disambiguation and (4) error correction. We then measured the performance of the resulting spell checker by comparing it to manual correction.ResultsWe used 12,056 words to train the spell checker and tested its performance on 8,131 words. During testing, sensitivity, specificity, and positive predictive value (PPV) for the spell checker were 74% (95% CI: 74–75), 100% (95% CI: 100–100), and 47% (95% CI: 46%–48%), respectively.ConclusionWe created a prototype spell checker that can be used to process AEFI reports. We used the UMLS Specialist Lexicon as the primary source of dictionary terms and the WordNet lexicon as a secondary source. We used the UMLS as a domain-specific source of dictionary terms to compare potentially misspelled words in the corpus. The prototype sensitivity was comparable to currently available tools, but the specificity was much superior. The slow processing speed may be improved by trimming it down to the most useful component algorithms. Other investigators may find the methods we developed useful for cleaning text using lexicons specific to their area of interest.

Highlights

The Institute of Medicine has identified patient safety as a key goal for health care in the United States
adverse events following immunization (AEFI) reports, such as those submitted to the U.S Vaccine Adverse Event Reporting System (VAERS) [3], contain free-text components that need to be processed manually by human encoders
Four unique challenges arise from linguistic variation found in the free-text components of AEFI reports: (1) synonyms and paraphrases can refer to a single symptom; (2) medical concepts are recorded by providers using abbreviations and acronyms aligned to a particular care setting; (3) the same health care or clinical concept can be described using combinations of different parts of speech; and (4) words are often mistyped which can cause unpredictable errors [4]

Summary

Introduction

Reports about adverse events following immunization (AEFI) from surveillance systems contain free-text components that can be analyzed using natural language processing. Bates et al noted that manual chart review is an effective method for identifying different types of adverse events in the research setting but this approach is too costly for routine use. They emphasized the role of analyzing the free-text components of electronic patient records to increase the chance of capturing these adverse events [2].

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Medical Informatics and Decision Making	Publication Date: Feb 12, 2007
Citations: 53	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

A UMLS-based spell checker for natural language processing in vaccine safety.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Medical Informatics and Decision Making

Lead the way for us

Similar Papers

Identifying Medical Concepts in Free Text Chief Complaint Data
D A Travers
Academic Emergency Medicine | VOL. 9
D A TraversD A Travers
01 May 2002
Academic Emergency Medicine | VOL. 9

Deployment of Real-time Natural Language Processing and Deep Learning Clinical Decision Support in the Electronic Health Record: Pipeline Implementation for an Opioid Misuse Screener in Hospitalized Adults.
Majid Afshar ... Cara Joyce
JMIR Medical Informatics | VOL. 11
Majid Afshar, et. al.Majid Afshar ... Cara Joyce
20 Apr 2023
JMIR Medical Informatics | VOL. 11

Development of a Custom Spell-Checker for Emergency Department Data
Sophie Rand ... Ramona Lall
Online Journal of Public Health Informatics | VOL. 11
Sophie Rand, et. al.Sophie Rand ... Ramona Lall
30 May 2019
Online Journal of Public Health Informatics | VOL. 11

An LSTM-based Spell Checker for Indonesian Text
Damar Zaky ... Ade Romadhony
-
Damar Zaky, et. al.Damar Zaky ... Ade Romadhony
01 Sep 2019
01 Sep 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A UMLS-based spell checker for natural language processing in vaccine safety.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Medical Informatics and Decision Making