An Analysis of Machine Learning Algorithms and Deep Neural Networks for Email Spam Classification using Natural Language Processing

Md Mohidul Hasan,Ayesha Siddika,Md Golam Rabiul Alam,Syed Mahbubuz Zaman,Md Asif Talukdar

doi:10.1109/soli54607.2021.9672398

Abstract

Due to the extensive use of technology in our daily lives, email has become essential for online correspondence between individuals from all walks of life. As such certain individuals have weaponized this service by bulk mailing malicious emails to recipients with the goal of retrieving some form of classified information. Thus, Email classification has become a major area of research as it enables identification and isolation of such malicious emails. The objectives of this paper include a robust comparison of several traditional machine learning (ML) algorithms, exploring transfer learning with static (non-trainable) pretrained GLOVE (Global word vector representation) embedding, comparison of several deep learning models trained with GLOVE and keras embedding separately. Among ML classifiers, XGBoost achieved the highest evaluation scores. Among deep learning algorithms, keras embedding based models outperformed GLOVE embedding based models by a small margin which shows the efficiency of transfer learning in downstream NLP tasks (parts of speech tagging).

Full Text