Online social networking (SN) data presents a data stream that is rich in context and temporal information. It holds promise for predicting suicidal thoughts and behaviors. The fusion of SN data with machine learning algorithms offers a potential path forward. This research proposes a Max Voting Ensemble classifier model applied to a Reddit dataset for the identification of suicidal ideation. The preprocessing involves data cleansing, tokenization, and lemmatization. Additionally, TF-IDF and Word2Vec word embedding techniques are applied. Diverse machine learning algorithms, including Support Vector Machines (SVM), Logistic Regression (LR), Random Forest (RF), Multinomial Naive Bayes (MNB), AdaBoost, and XGBoost, are implemented. The results of selected Machine Learning Classifiers (MLCs) are amalgamated using a Max Voting Ensemble classifier. The research findings clearly indicate that the Max Voting Ensemble classifier yields improved precision of 91.39% coupled with a substantial accuracy of 87.5%. The application of Ensembling Techniques (ET) to SN data holds the potential to address the complexities and modeling challenges inherent in predicting acute Suicidal Ideation within these dynamic time scales.
Read full abstract