Spam detection in SMS communication is a crucial task for maintaining the quality of messaging services and protecting users from unwanted and potentially harmful messages. Arabic SMS spam detection poses unique challenges due to the rich morphology and complex structure of the Arabic language, which can significantly impact the performance of traditional text classification methods. To address these challenges, this paper presents a novel approach for Arabic SMS spam detection using a hybrid deep learning model that combines Convolutional Neural Networks (CNN) and Bidirectional Long Short-Term Memory (Bi-LSTM) networks. The proposed model leverages the strengths of CNNs in capturing local features and patterns in the text and the capability of Bi-LSTM networks to understand long-term dependencies and contextual information. This hybrid architecture is designed to effectively handle the complexities of the Arabic language and improve the accuracy of spam detection. The model was evaluated on a dataset of Arabic SMS messages, consisting of both ham (non-spam) and spam messages. The dataset underwent preprocessing steps, including text cleaning, tokenization, and padding, to prepare it for training the deep learning model. The hybrid model was trained using the Adam optimizer and evaluated using accuracy, precision, recall, and F1 score metrics. Early stopping was implemented to prevent overfitting during the training process. The results demonstrate that the hybrid model achieved high performance, with an accuracy of 0.9699, precision of 0.9739, recall of 0.9675, and an F1 score of 0.9707. These metrics indicate the model's effectiveness in accurately detecting spam in Arabic SMS messages. Additionally, the paper provides visualizations of the confusion matrix, ROC curve, and training-validation loss graph to illustrate the model's performance. The implications of this research are significant for the field of Arabic text classification and spam detection. The proposed hybrid model offers a robust solution for accurately classifying Arabic SMS messages, which can be integrated into messaging platforms to enhance spam detection capabilities and improve user experience. Future work could explore data augmentation techniques, transfer learning, and advanced hybrid architectures to further enhance the model's performance and applicability.
Read full abstract