Abstract

AbstractThe phenomenon of suicide has been a focal point since Durkheim among social scientists. Internet and social media sites provide new ways for people to express their positive feelings, but they are also platforms to express suicide ideation or depressed thoughts. Most of these posts are not about real suicide, and some of them are a cry for help. Nevertheless, suicide- and depression-related content varies among platforms, and it is not evident how a researcher can find these materials in mass data of social media. Our paper uses the corpus of more than four million Instagram posts, related to mental health problems. After defining the initial corpus, we present two different strategies to find the relevant sociological content in the noisy environment of social media. The first approach starts with a topic modeling (Latent Dirichlet Allocation), the output of which serves as the basis of a supervised classification method based on advanced machine-learning techniques. The other strategy is built on an artificial neural network-based word embedding language model. Based on our results, the combination of topic modeling and neural network word embedding methods seems to be a promising way to find the research related content in a large digital corpus.Our research can provide added value in the detection of possible self-harm events. With the utilization of complex techniques (such as topic modeling and word embedding methods), it is possible to identify the most problematic posts and most vulnerable users.

Highlights

  • The phenomenon of suicide is in the focus of social scientists since Durkheim (1897, 2002)

  • We used two main methods: Latent Dirichlet Allocation and a neural networkbased word embedding model to find the theory-relevant posts from the different contents, which remained in our corpora after the data cleaning process mentioned earlier

  • LDA topic modeling has been widely used in text mining research for a long time

Read more

Summary

Introduction

The phenomenon of suicide is in the focus of social scientists since Durkheim (1897, 2002). In his classic work “Suicide” – he analyzed the spatial and temporal variation of suicide. He argued that in most cases, social causes could be found behind suicide variation. The topic was always in the center of scientific discourse, but after his work, even more attention was generated.

Koltai ( ) Centre for Social Sciences, Hungarian Academy of Sciences Centre of Excellence, Budapest, Hungary
Recent Studies of the Field
Data and Methods
The Application of Topic Model
The Application of Word-Embedding
Findings
Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call