Ensemble Learning And its Application in Spam Detection

Arka Ghosh,Shreyashi Dey,Raja Das,Gautam Mahapatra

doi:10.1109/iccece51049.2023.10085378

Abstract

An individual model is not always sufficient enough to classify an email. Each spam mail has features that distinguish it from any other regular mail. A model might not always use that feature for classification and thus produce erroneous results. It is essential to cross-verify the output of one model, with that of another model. This can be done using the ensemble learning technique. Previously, this was done using the same model repeatedly, or different variants of the model. However, in this paper, we have used four completely different models and used them to perform max voting, to optimize the result. The models used are Support Vector Machine(SVM), Multinomial Naïve Bayes(MNB), Random Forest(RF), and Decision Tree(DT). After testing all the possible combinations, we were able to conclude that the combination of SVM, MNB, and DT gives the optimal result.

Full Text