EDBase: Generating a Lexicon Base for Eating Disorders Via Social Media

Tarique Anwar,Suku Sukunesan,Adrian Shatte,Hannah K Jarman,Wired Team,Matthew Fuller-Tyszkiewicz,Mohammad Abuhassan

doi:10.1109/jbhi.2022.3211151

Tarique Anwar, Suku Sukunesan + Show 5 more

https://doi.org/10.1109/jbhi.2022.3211151

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Eating disorders (EDs) are characterised by abnormal eating habits and obsessive thought about food, weight, shape, and body image. EDs are experienced by a significant portion of our population. Social media is identified as a possible source of influence for EDs, and there is growing evidence of a large amount of ED-related discussions on the Web via social media platforms, such as Twitter. With this growing trend, automatic content analysis for EDs is becoming increasingly important. To date, there does not exist any comprehensive benchmark ED lexicon to identify ED-related conversations that would, in turn, facilitate these content analysis tasks. In this paper, we propose a novel method for generating a lexicon base for ED language, called <monospace xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">EDBase</monospace> . The method starts with collecting over 3.7 million ED-focused tweets. In order to semantically represent potential ED terminology in a vector space, an ED word embedding model ( <monospace xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">EDModel</monospace> ) is trained. Then we develop a novel multi-seeded hierarchical density-based algorithm with contrasting corpora for ED lexicon expansion. The <monospace xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">EDModel</monospace> is queried by the proposed lexicon expansion algorithm to expand the seed terms to a comprehensive lexicon base. Our <monospace xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">EDBase</monospace> consists of a (further expandable) list of 3794 high-quality ED terms, quantified by an ED score, and linked to their parent terms. The proposed method significantly outperforms all existing alternative baseline methods and models by over 25% in terms of precision and 1500 in terms of true positives. This research is expected to be impactful in the health data science and healthcare community.

Full Text