AbstractSince the advent of email services, spam emails have been a major concern because users’ security depends on the classification of emails as ham or spam. It’s a malware attack that has been used for spear phishing, whaling, clone phishing, website forgery, and other harmful activities. However, various ensemble Machine Learning (ML) algorithms used for the detection and filtering of spam emails have been less explored. In this research, we offer a ML-based optimized algorithm for detecting spam emails that have been enhanced using Hyper-parameter tuning approaches. The proposed approach uses two feature extraction modules, namely Count-Vectorizer and TFIDF-Vectorizer that provide the most effective classification results when we apply them to three different publicly available email data sets: Ling Spam, UCI SMS Spam, and the Proposed dataset. Moreover, to extend the performance of classifiers we used various ML methods such as Naive Bayes (NB), Logistic Regression (LR), Extra Tree, Stochastic Gradient Descent (SGD), XG-Boost, Support Vector Machine (SVM), Random Forest (RF), Multi-layer Perception (MLP), and parameter optimization approaches such as Manual search, Random search, Grid search, and Genetic algorithm. For all three data sets, the SGD outperformed other algorithms. All of the other ensembles (Extra Tree, RF), linear models (LR, Linear-SVC), and MLP performed admirably, with relatively high precision, recall, accuracies, and F1-score.
Read full abstract