Abstract

BackgroundTimely data is key to effective public health responses to epidemics. Drug overdose deaths are identified in surveillance systems through ICD-10 codes present on death certificates. ICD-10 coding takes time, but free-text information is available on death certificates prior to ICD-10 coding. The objective of this study was to develop a machine learning method to classify free-text death certificates as drug overdoses to provide faster drug overdose mortality surveillance.MethodsUsing 2017–2018 Kentucky death certificate data, free-text fields were tokenized and features were created from these tokens using natural language processing (NLP). Word, bigram, and trigram features were created as well as features indicating the part-of-speech of each word. These features were then used to train machine learning classifiers on 2017 data. The resulting models were tested on 2018 Kentucky data and compared to a simple rule-based classification approach. Documented code for this method is available for reuse and extensions: https://github.com/pjward5656/dcnlp.ResultsThe top scoring machine learning model achieved 0.96 positive predictive value (PPV) and 0.98 sensitivity for an F-score of 0.97 in identification of fatal drug overdoses on test data. This machine learning model achieved significantly higher performance for sensitivity (p<0.001) than the rule-based approach. Additional feature engineering may improve the model’s prediction. This model can be deployed on death certificates as soon as the free-text is available, eliminating the time needed to code the death certificates.ConclusionMachine learning using natural language processing is a relatively new approach in the context of surveillance of health conditions. This method presents an accessible application of machine learning that improves the timeliness of drug overdose mortality surveillance. As such, it can be employed to inform public health responses to the drug overdose epidemic in near-real time as opposed to several weeks following events.

Highlights

  • Death certificates (DCs) are the primary source for state and local drug overdose (OD) mortality surveillance and are currently the only nationwide source [1]

  • Drug overdose mortality surveillance using machine learning and data may be requested for research purposes from this body; the address for the Kentucky Office of Vital Statistics is: 275 E

  • Machine learning using natural language processing is a relatively new approach in the context of surveillance of health conditions. This method presents an accessible application of machine learning that improves the timeliness of drug overdose mortality surveillance

Read more

Summary

Introduction

Death certificates (DCs) are the primary source for state and local drug overdose (OD) mortality surveillance and are currently the only nationwide source [1]. An electronic record with selected DC fields, including the free text information for the cause-of-death [6], is transmitted to the National Center for Health Statistics (NCHS) and coded according to the guidelines of the International Classification of Diseases, Tenth Revision (ICD-10) to allow standardized classification of the causes of death [7,8,9,10]. There is a significant time lag between the day of death and the day when an ICD-10coded DC record is available for identification of a drug OD death (the consensus definition for drug OD mortality surveillance is based on the UCOD code in the range X40-X44, X60X64, X85, or Y10-Y14 [11, 12]). The objective of this study was to develop a machine learning method to classify free-text death certificates as drug overdoses to provide faster drug overdose mortality surveillance

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call