Early Detection of Severe Flu Outbreaks using Contextual Word Embeddings

Redouane Karsi,Mounia Zaim,Jamila El

doi:10.14569/ijacsa.2021.0120227

Abstract

The purpose of automated health surveillance systems is to predict the emergence of a disease. In most cases, these systems use a text categorization model to classify any clinical text into a category corresponding to an illness. The problem arises when the target classes refer to diseases sharing multiple information such as symptoms. Thus, the classifier will have difficulty discriminating the disease under surveillance from other conditions of the same family, causing an increase in misclassification rate. Clinical texts contain keywords carrying relevant information to distinguish diseases with similar symptoms. However, these specific words are rare and sparse. Therefore, they have a minor impact on machine learning models' performance. Assuming that emphasizing specific terms contributes to improving classification performance, we propose an algorithm that enriches training samples with terms semantically similar to specific terms using the deep contextualized word embeddings ELMo. Next, we devise a weighting scheme combining chi-square and semantic scores to reflect the relatedness between features and the disease under surveillance. We evaluate our model using the SVM algorithm trained on i2b2 dataset supplemented by documents collected from Ibn Sina hospital in Rabat. Experimental results show a clear improvement in classification performance than baseline methods with an F-measure reaching 86.54%.

Highlights

Public health surveillance is a significant focus of National health policies
Evaluation A health surveillance system must be efficient enough to accurately detect the onset and progression over time of a disease, so our proposed model is designed to meet the following two requirements: 1) Reduce the proportion of mild flu-related documents classified as severe flu, this has the effect of avoiding false outbreak alerts
A system for detecting the occurrence of severe forms of flu by using only clinical texts recorded in electronic health record (EHR) is devised through a text classification model with the challenge of discriminating between severe and mild flu-related documents containing many common features

Summary

INTRODUCTION

Public health surveillance is a significant focus of National health policies. It is ensured by collecting epidemiological data from various healthcare facilities to detect disease outbreaks and subsequently plan appropriate response strategies early. The risk of misclassification increases, especially for documents related to severe influenza cases, since the frequency of specific features that characterize severe cases is low compared to common features frequency In this respect, many research efforts attempt to improve feature selection algorithms by highlighting the discriminative power of infrequent specific terms. The idea behind this algorithm is to mitigate the deficiency caused by the scarcity of specific features by adding new features to training samples in order to counterbalance the preponderance of common features This algorithm is based on a deep contextualized word representation method named: Embeddings from language models (ELMo), renowned for its power in detecting the finest syntactic and semantic characteristics of words. Experimental results show significant improvement compared to ontology-based feature methods and static word embeddings models with a notable decrease in misclassification rate of test clinical notes related to severe flu by reaching an F-measure of 86.54%.

RELATED WORK

OUR FEATURE ENGINEERING APPROACH

Text Preprocessing

Word Embeddings Generation

Term Weighting Scheme

RESULTS AND DISCUSSION

Experimental Results and Discussion

CONCLUSION

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Early Detection of Severe Flu Outbreaks using Contextual Word Embeddings

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications

Lead the way for us

Journal: International Journal of Advanced Computer Science and Applications	Publication Date: Jan 1, 2021
License type: cc-by

Similar Papers

Development and Validation of a Machine Learning Approach Leveraging Real-World Clinical Narratives as a Predictor of Survival in Advanced Cancer.
Frank Po-Yen Lin ... Richard J Epstein
JCO clinical cancer informatics | VOL. 6
Frank Po-Yen Lin, et. al.Frank Po-Yen Lin ... Richard J Epstein
01 Oct 2022
JCO clinical cancer informatics | VOL. 6

Study of Resistance to Stress and Burnout among Public Health Professionals: The Case of Nurses and Physicians at Ibn Sina Hospital in Rabat Morocco
Hassan Chtibi ... Ahmed Ahami
Open journal of medical psychology | VOL. 07
Hassan Chtibi, et. al.Hassan Chtibi ... Ahmed Ahami
01 Jan 2018
Open journal of medical psychology | VOL. 07

Improving autocoding performance of rare categories in injury classification: Is more training data or filtering the solution?
Gaurav Nanda ... Mark Lehto
Accident Analysis & Prevention | VOL. 110
Gaurav Nanda, et. al.Gaurav Nanda ... Mark Lehto
08 Nov 2017
Accident Analysis & Prevention | VOL. 110

Artificial Intelligence and Machine Learning: What You Always Wanted to Know but Were Afraid to Ask
Puru Rattan ... Daniel D Penrice
Gastro hep advances | VOL. 1
Puru Rattan, et. al.Puru Rattan ... Daniel D Penrice
01 Jan 2021
Gastro hep advances | VOL. 1

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Early Detection of Severe Flu Outbreaks using Contextual Word Embeddings

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications