Depression is a serious mental state that negatively impacts thoughts, feelings, and actions. Social media use is rapidly growing, with people expressing themselves in their regional languages. In Pakistan and India, many people use Roman Urdu on social media. This makes Roman Urdu important for predicting depression in these regions. However, previous studies show no significant contribution in predicting depression through Roman Urdu or in combination with structured languages like English. The study aims to create a Roman Urdu dataset to predict depression risk in dual languages [Roman Urdu (non-structural language) + English (structural language)]. Two datasets were used: Roman Urdu data manually converted from English on Facebook, and English comments from Kaggle. These datasets were merged for the research experiments. Machine learning models, including Support Vector Machine (SVM), Support Vector Machine Radial Basis Function (SVM-RBF), Random Forest (RF), and Bidirectional Encoder Representations from Transformers (BERT), were tested. Depression risk was classified into not depressed, moderate, and severe. Experimental studies show that the SVM achieved the best result with anaccuracy of 0.84% compared to existing models. The presented study refines thearea of depression to predict the depression in Asian countries.
Read full abstract