Abstract

BackgroundFree-text clinical records provide a source of information that complements traditional disease surveillance. To electronically harness these records, they need to be transformed into codified fields by natural language processing algorithms.ObjectiveThe aim of this study was to develop, train, and validate Clinical History Extractor for Syndromic Surveillance (CHESS), an natural language processing algorithm to extract clinical information from free-text primary care records.MethodsCHESS is a keyword-based natural language processing algorithm to extract 48 signs and symptoms suggesting respiratory infections, gastrointestinal infections, constitutional, as well as other signs and symptoms potentially associated with infectious diseases. The algorithm also captured the assertion status (affirmed, negated, or suspected) and symptom duration.Electronic medical records from the National Healthcare Group Polyclinics, a major public sector primary care provider in Singapore, were randomly extracted and manually reviewed by 2 human reviewers, with a third reviewer as the adjudicator. The algorithm was evaluated based on 1680 notes against the human-coded result as the reference standard, with half of the data used for training and the other half for validation.ResultsThe symptoms most commonly present within the 1680 clinical records at the episode level were those typically present in respiratory infections such as cough (744/7703, 9.66%), sore throat (591/7703, 7.67%), rhinorrhea (552/7703, 7.17%), and fever (928/7703, 12.04%). At the episode level, CHESS had an overall performance of 96.7% precision and 97.6% recall on the training dataset and 96.0% precision and 93.1% recall on the validation dataset. Symptoms suggesting respiratory and gastrointestinal infections were all detected with more than 90% precision and recall. CHESS correctly assigned the assertion status in 97.3%, 97.9%, and 89.8% of affirmed, negated, and suspected signs and symptoms, respectively (97.6% overall accuracy). Symptom episode duration was correctly identified in 81.2% of records with known duration status.ConclusionsWe have developed an natural language processing algorithm dubbed CHESS that achieves good performance in extracting signs and symptoms from primary care free-text clinical records. In addition to the presence of symptoms, our algorithm can also accurately distinguish affirmed, negated, and suspected assertion statuses and extract symptom durations.

Highlights

  • IntroductionThe world continues to be vulnerable to the threat from infectious diseases

  • Study Background and RationaleThe world continues to be vulnerable to the threat from infectious diseases

  • Singapore currently has universal uptake of electronic health records among its public sector health care providers, and syndromic surveillance systems leveraging on electronic medical records (EMRs) to identify syndromes may help to overcome some of these limitations by providing surveillance data that complement our existing methods for surveillance [5,6]

Read more

Summary

Introduction

The world continues to be vulnerable to the threat from infectious diseases This includes novel emerging infections, changes in the incidence or severity of common circulating pathogens, as well as the potential use of infectious agents in bioterrorism. The existing surveillance system with its traditional reliance on physician and laboratory diagnoses and reports has several limitations that may lead to delays in the recognition and notification of an outbreak These include a dependence on timely recognition and reporting by clinicians, challenges faced by clinicians in recognizing the unexpected presentations of novel pathogens, and delays in obtaining laboratory results for agent identification [3,4,5]. To electronically harness these records, they need to be transformed into codified fields by natural language processing algorithms

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call