Abstract. Our study try to address the challenge of accurately identifying depression and suicidal ideation on social media platforms by introducing an enhanced novel methodology for unsupervised feature selection and label correction. Utilizing advanced word embedding models like BERT and the Universal Sentence Encoder, we transform textual content into dense numerical vectors that capture the nuanced emotional context of online discussions. Our approach enhances these embeddings with a deep neural network (DNN) to extract distinctive features, reducing the dimensionality of the data through Principal Component Analysis (PCA). For label correction, we employ clustering techniques including OPTICS, K-medoids, and hierarchical clustering, which are robust against noisy data points. We then train classifiers using CNN, DNN, logistic regression, and random forest algorithms, evaluated with metrics such as accuracy, precision, recall, F1 score, and AUC. This methodology improves the accuracy of classifying depressive and suicidal sentiments to some extent, assisting to utilize the vast data available on social media to advance mental health diagnostics and interventions.
Read full abstract