Spam messages have emerged as a significant issue in digital communication, adversely affecting users’ mental health, personal safety, and network resources. Traditional spam detection methods often suffer from low detection rates and high false positives, underscoring the need for more effective solutions. This paper proposes the EGMA model, an ensemble learning-based hybrid approach for spam detection in SMS messages, which integrates gated recurrent unit (GRU), multilayer perceptron (MLP), and hybrid autoencoder models utilizing a majority voting algorithm. The EGMA model enhances performance by incorporating additional statistical features extracted from message content and employing text vectorization techniques, such as Term Frequency–Inverse Document Frequency (TF-IDF) and CountVectorizer. The proposed model achieved impressive classification accuracies of 99.28% on the SMS Spam Collection dataset, 99.24% on the Email Spam dataset, 99.00% on the Enron-Spam dataset, 98.71% on the Super SMS dataset, and 95.09% on UtkMl’s Twitter Spam dataset. These results demonstrate that the EGMA model outperforms individual models and existing methods in the literature, providing a robust solution for enhancing spam detection performance and effectively mitigating the threats that spam messages pose in digital communication.
Read full abstract