With the rapid rise of mobile communication, Short Message Service (SMS) has become an essential platform for transmitting information. However, the growing volume of unsolicited and harmful spam messages presents significant challenges for both users and mobile network operators. This study explores the effectiveness of various machine learning models, including Random Forest, Gradient Boosting, AdaBoost, Support Vector Machine (SVM), Logistic Regression, and an Ensemble Voting Classifier, in detecting SMS spam. A dataset containing 5,572 SMS messages, labeled as either spam or ham (legitimate), was used to evaluate these models. Hyperparameter tuning was performed on each model to optimize accuracy, and the models were assessed using metrics such as precision, recall, F1-score, and accuracy. The results indicated that the SVM and Ensemble Voting Classifier achieved the highest performance, with accuracies of 0.9857 and 0.9848, respectively. Both models demonstrated superior recall for spam messages, making them highly effective for real-world spam detection systems. While Random Forest, Gradient Boosting, and AdaBoost also performed well, their slightly lower recall for spam suggests that they may misclassify some spam as legitimate messages. The study highlights the effectiveness of machine learning models in addressing the SMS spam problem, particularly when using ensemble methods. Future research should focus on addressing class imbalance and exploring deep learning approaches to further enhance model performance. These findings offer valuable insights for developing more accurate and scalable SMS spam detection systems.
Read full abstract