Abstract

In this study, we aim to detect in social media texts written in Hebrew girls who are suspected of being anorexic. We constructed a dataset containing 100 blog posts written by females who are probably anorexic, and 100 blog posts written by females who are likely to be non-anorexic. The construction of this dataset was supervised and approved by an international expert on anorexia. We tested several text classification (TC) methods, using various feature sets (content-based and style-based), five machine learning (ML) methods, three RNN models, four BERT models, three basic preprocessing methods, three feature filtering methods, and parameter tuning. Several insights were found as follows. A set of 50-word n-grams (mostly word unigrams) given by an expert was found as a good basic detector. A heuristic process based on the random forest ML method has overcome a combinatorial explosion and led to significant improvement over a baseline result at a level of <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\text{P}\,{=}$ </tex-math></inline-formula> .01. Application of an iterative process that tests combinations of “k out of <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\text{n}'$ </tex-math></inline-formula> ” where <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\text{n}'\,{ &lt; }$ </tex-math></inline-formula> n (n is the number of feature sets) lead to a result of 90.63%, using a combination of 300 features from ten feature sets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.