Machine Learning-Based Email Spam Filter

Jiuyue Zhang

doi:10.56397/ist.2024.05.05

Abstract

The surge in email usage has been accompanied by an unwelcome increase in spam, posing challenges for individuals and organizations. This study addresses the urgent need for efficient spam detection by evaluating the performance of three widely-used machine learning classifiers: Naive Bayes, Random Forest, and Logistic Regression. Our approach includes comprehensive data preparation, feature extraction, model training, and rigorous performance evaluation using metrics such as Accuracy, Precision, Recall, and F1-Score. Furthermore, we analyze the trade-off between accuracy and computational efficiency, essential for real-time spam detection systems. A user-friendly interface developed with Flask showcases the practical application of our findings. The Random Forest Classifier outperforms its counterparts, proving to be the most effective in accurately classifying emails and maintaining a balance between sensitivity and specificity. The study’s implications highlight the potential for sophisticated machine learning-based spam filters to enhance email security.

Full Text