A detection research of spams based on machine learning algorithms

Zhe Liu

doi:10.54254/2755-2721/17/20230902

Abstract

The wide spread of spam has brought a lot of inconvenience and trouble to peoples work and lives. Therefore, it is of great practical significance to constantly update the methods of spam classification and filtering to improve the current situation of email use. In this paper, linear regression and logistic regression are examined to test whether a random email is spam or a normal email. The logistic regression model is based on a public data set that is estimated by calculating the number of entries in the entire set and then the probability of spam. The linear regression model is based on the data from the logistic regression model and is estimated to give a line representing the probability of spam in a given range of emails. Finally, the results of these two models clearly indicate the rampant and widespread nature of spam, which can enhance the publics overall awareness of carefully examining unknown emails.

Full Text