Monitoring the psychological conditions of social media users during rapidly developing public health crises, such as the COVID-19 pandemic, using their posts on social media has rapidly gained popularity as a relatively easy and cost-effective method. However, the characteristics of individuals who created these posts are largely unknown, making it difficult to identify groups of individuals most affected by such crises. In addition, large annotated data sets for mental health conditions are not easily available, and thus, supervised machine learning algorithms can be infeasible or too costly. This study proposes a machine learning framework for the real-time surveillance of mental health conditions that does not require extensive training data. Using survey-linked tweets, we tracked the level of emotional distress during the COVID-19 pandemic by the attributes and psychological conditions of social media users in Japan. We conducted online surveys of adults residing in Japan in May 2022 and collected their basic demographic information, socioeconomic status, and mental health conditions, along with their Twitter handles (N=2432). We computed emotional distress scores for all the tweets posted by the study participants between January 1, 2019, and May 30, 2022 (N=2,493,682) using a semisupervised algorithm called latent semantic scaling (LSS), with higher values indicating higher levels of emotional distress. After excluding users by age and other criteria, we examined 495,021 (19.85%) tweets generated by 560 (23.03%) individuals (age 18-49 years) in 2019 and 2020. We estimated fixed-effect regression models to examine their emotional distress levels in 2020 relative to the corresponding weeks in 2019 by the mental health conditions and characteristics of social media users. The estimated level of emotional distress of our study participants increased in the week when school closure started (March 2020), and it peaked at the beginning of the state of emergency (estimated coefficient=0.219, 95% CI 0.162-0.276) in early April 2020. Their level of emotional distress was unrelated to the number of COVID-19 cases. We found that the government-induced restrictions disproportionately affected the psychological conditions of vulnerable individuals, including those with low income, precarious employment, depressive symptoms, and suicidal ideation. This study establishes a framework to implement near-real-time monitoring of the emotional distress level of social media users, highlighting a great potential to continuously monitor their well-being using survey-linked social media posts as a complement to administrative and large-scale survey data. Given its flexibility and adaptability, the proposed framework is easily extendable for other purposes, such as detecting suicidality among social media users, and can be used on streaming data for continuous measurement of the conditions and sentiment of any group of interest.
Read full abstract