Abstract

Spam detection is a critical task in cybersecurity, aiming to filter out unsolicited and potentially harmful communications. This study investigates the impact of various data augmentation techniques on enhancing the performance of Convolutional Neural Network (CNN) models for spam detection. Utilizing the Enron Email Dataset, we implemented several augmentation methods, including synonym replacement, random insertion, random swap, random deletion, back translation, and noise addition. Our results indicate significant performance improvements with these techniques. The baseline CNN model achieved an accuracy of 87.5%, precision of 85.2%, recall of 83.7%, and F1-score of 84.4%. The application of back translation, the most effective technique, increased accuracy to 90.3% and F1-score to 88.0%. These findings demonstrate the potential of data augmentation in improving spam detection systems, providing a robust foundation for future research. The study also highlights the importance of combining augmentation techniques and adapting them to different languages and real-world scenarios for even greater performance gains.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call