Robust Drug Use Detection on X: Ensemble Method with a Transformer Approach

Reem Al-Ghannam,Mourad Ykhlef,Hmood Al-Dossari

doi:10.1007/s13369-024-08845-6

Reem Al-Ghannam, Mourad Ykhlef + Show 1 more

Open Access

PDF Available

https://doi.org/10.1007/s13369-024-08845-6

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

There is a growing trend for groups associated with drug use to exploit social media platforms to propagate content that poses a risk to the population, especially those susceptible to drug use and addiction. Detecting drug-related social media content has become important for governments, technology companies, and those responsible for enforcing laws against proscribed drugs. Their efforts have led to the development of various techniques for identifying and efficiently removing drug-related content, as well as for blocking network access for those who create it. This study introduces a manually annotated Twitter dataset consisting of 112,057 tweets from 2008 to 2022, compiled for use in detecting associations connected with drug use. Working in groups, expert annotators classified tweets as either related or unrelated to drug use. The dataset was subjected to exploratory data analysis to identify its defining features. Several classification algorithms, including support vector machines, XGBoost, random forest, Naive Bayes, LSTM, and BERT, were used in experiments with this dataset. Among the baseline models, BERT with textual features achieved the highest F1-score, at 0.9044. However, this performance was surpassed when the BERT base model and its textual features were concatenated with a deep neural network model, incorporating numerical and categorical features in the ensemble method, achieving an F1-score of 0.9112. The Twitter dataset used in this study was made publicly available to promote further research and enhance the accuracy of the online classification of English-language drug-related content.

Full Text