Classification of Spam E-mail based on Naïve Bayes Classification Model

Shaopeng Cheng

doi:10.54097/hset.v39i.6640

Abstract

With the rising number of spam email, the need of more sufficient antispam filter is surging. Phishing attack can lead to extremely large losses of companies and individual, even more than 1 billion dollars in one year. This paper investigates and combines Naïve Bayes Classification and clustering algorithm in the application of identifying spam emails. With sample emails to create a dynamic dictionary containing most frequent words in spam and normal emails, this distribution of spam filter will provide a stricter method to prevent spam emails than those methods used in mail companies, e.g., Google, Yahoo, and Outlook.com. Besides, this paper also compares several algorithms used today in classifying spams and the future techniques of deep learning and machine learning’s application in classifying spam emails. According to the analysis, Google’s algorithm has the most comprehensive function, but such algorithm has less strict rule than Yahoo’s. Outlook.com, as a combination of Microsoft application, it has a unique algorithm for encrypting and filtering spams. Overall, these results shed light on guiding further exploration of both comprehensive and strict rule for classifying spams.

Full Text