Extracting medical entities from social media

Sanja Scepanovic,Khan Baykaner,Enrique Martin-Lopez,Daniele Quercia

doi:10.1145/3368555.3384467

Abstract

Accurately extracting medical entities from social media is challenging because people use informal language with different expressions for the same concept, and they also make spelling mistakes. Previous work either focused on specific diseases (e.g., depression) or drugs (e.g., opioids) or, if working with a wide-set of medical entities, only tackled individual and small-scale benchmark datasets (e.g., AskaPatient). In this work, we first demonstrated how to accurately extract a wide variety of medical entities such as symptoms, diseases, and drug names on three benchmark datasets from varied social media sources, and then also validated this approach on a large-scale Reddit dataset. We first implemented a deep-learning method using contextual embeddings that upon two existing benchmark datasets, one containing annotated AskaPatient posts (CADEC) and the other containing annotated tweets (Micromed), outperformed existing state-of-the-art methods. Second, we created an additional benchmark dataset by annotating medical entities in 2K Reddit posts (made publicly available under the name of MedRed) and showed that our method also performs well on this new dataset. Finally, to demonstrate that our method accurately extracts a wide variety of medical entities on a large scale, we applied the model pre-trained on MedRed to half a million Reddit posts. The posts came from disease-specific subreddits so we could categorise them into 18 diseases based on the subreddit. We then trained a machine-learning classifier to predict the post's category solely from the extracted medical entities. The average F1 score across categories was .87. These results open up new cost-effective opportunities for modeling, tracking and even predicting health behavior at scale.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Extracting medical entities from social media

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Activity of selected healthcare entities in the Lower Silesia region through their social media activities in the context of image shaping
Iwona Czerska
Scientific Papers of Silesian University of Technology. Organization and Management Series | VOL. 2023
Iwona CzerskaIwona Czerska
01 Jan 2023
Scientific Papers of Silesian University of Technology. Organization and Management Series | VOL. 2023

The Effect of Monetary Incentives on Health Care Social Media Content: Study Based on Topic Modeling and Sentiment Analysis.
Negar Maleki ... Balaji Padmanabhan
Journal of Medical Internet Research | VOL. 25
Negar Maleki, et. al.Negar Maleki ... Balaji Padmanabhan
11 May 2023
Journal of Medical Internet Research | VOL. 25

Medical entity recognition and knowledge map relationship analysis of Chinese EMRs based on improved BiLSTM-CRF
Jia Ke ... Shuai Jin
Computers and Electrical Engineering | VOL. 108
Jia Ke, et. al.Jia Ke ... Shuai Jin
17 Apr 2023
Computers and Electrical Engineering | VOL. 108

Topics Analysis of Reddit and Twitter Posts Discussing Inflammatory Bowel Disease and Distress From 2017 to 2019.
Jacob A Rohde ... Seth M Noar
Crohn's & colitis 360 | VOL. 3
Jacob A Rohde, et. al.Jacob A Rohde ... Seth M Noar
01 Jul 2021
Topics Analysis of Reddit and Twitter Posts Discussing Inflammatory Bowel Disease and Distress From 2017 to 2019.
Jacob A Rohde ... Seth M Noar

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Extracting medical entities from social media

Abstract

Talk to us

Similar Papers