Analysis of Spam Classification Based on Naive Bayes and Random Forest Model

Kejia Li

doi:10.54254/2754-1169/84/20240817

Abstract

Spam classification has become more and more significant in email filtering and content auditing systems nowadays. Despite the development of many ways for filtering spam, spammers continue to adopt new methods for spam detection, which has left us overwhelmed with spam. Furthermore, robust, and flexible categorization algorithms are necessary to keep up with the constant evolution of spam tactics. The best method for categorizing and filtering spam now is to use machine learning techniques. In this study, a large spam dataset containing 5572 email instances is used in simulations for the spam classification task. This study comparatively analyzes two prevalent machine learning algorithms, namely, Random Forest and Naive Bayes. A detailed description of both algorithms, including their theoretical foundations and practical implementations in spam detection, is provided. In addition, the data was characterized in the study for training the models as well as making predictions. Finally, the effectiveness and performance of each algorithm is shown in the experimental evaluation using four commonly used performance evaluation metrics. Overall, these results providing insights into their strengths and limitations in practical spam filtering applications.

Full Text