Spam Mail Classification Using Ensemble and Non-Ensemble Machine Learning Algorithms

Khyati Agarwal,Varun Dutt,Suryavanshi Virendrasingh,Sai Krishna,Prakhar Uniyal

doi:10.1007/978-981-15-7106-0_18

Abstract

Spam in emails has been a prevalent issue ever since the inception of the email service. However, the use of ensemble (aggregate) and non-ensemble algorithms for the detection and filtering of spam has been less explored. In this paper, we develop certain ensemble and non-ensemble machine learning (ML) algorithms for classifying emails as spam or ham (i.e., not spam). Using the Enron-SMS dataset from the UCI ML repository and an 80 and 20% training and test split, we develop and calibrate non-ensemble ML algorithms like KNN, Naive Bayes, and Support Vector Machine. Also, we develop and calibrate ensemble ML algorithms containing the non-ensemble algorithms via voting, bagging, and boosting methods. Results reveal that the non-ensemble Support Vector Machine performed the best with 98.47% accuracy on test data and it was followed by the ensemble voting algorithm with 96.80% accuracy on test data. We highlight the implications of using non-ensemble and ensemble methods for spam classification in the real world.

Full Text