Abstract
Spam in emails has been a prevalent issue ever since the inception of the email service. However, the use of ensemble (aggregate) and non-ensemble algorithms for the detection and filtering of spam has been less explored. In this paper, we develop certain ensemble and non-ensemble machine learning (ML) algorithms for classifying emails as spam or ham (i.e., not spam). Using the Enron-SMS dataset from the UCI ML repository and an 80 and 20% training and test split, we develop and calibrate non-ensemble ML algorithms like KNN, Naive Bayes, and Support Vector Machine. Also, we develop and calibrate ensemble ML algorithms containing the non-ensemble algorithms via voting, bagging, and boosting methods. Results reveal that the non-ensemble Support Vector Machine performed the best with 98.47% accuracy on test data and it was followed by the ensemble voting algorithm with 96.80% accuracy on test data. We highlight the implications of using non-ensemble and ensemble methods for spam classification in the real world.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.