Abstract
Although the application of network security protocols and cryptography provides a certain security guarantee for Internet surfing, it is difficult to cure the persistent security problems. Driven by the promotion of e-mail technology and benefits, bad businesses will also issue promotional emails indiscriminately to a large number of mailboxes, and even drive the underground industry of private mailbox information trading. The existing spam filters use black and whitelists, sensitive word matching and other technologies, but they can not effectively filter all forms of spam, and non-spam is often filtered, which brings more trouble to users. With the rise of artificial intelligence, machine learning algorithms have been applied to spam recognition, such as decision tree algorithm, Boosting algorithm, K nearest neighbour algorithm, SVM support vector machine algorithm, Bayesian principle related algorithms, etc. These methods based on traditional statistics can intelligently classify data sets with large differences and are often used together with expert systems with certain rules to classify spam. However, with the diversification of spam types, the old classification rules are relatively rigid, and new types of mail will be misjudged. In addition, statistics based natural language processing method is based on pre trained fixed dictionaries. For new words and polysemy words, it is impossible to give word vectors with accurate semantics, which brings difficulties to classification. This paper mainly studies the application of five machine learning algorithms in spam detection: improved naive Bayes algorithm, A Lite Bidirectional Encoder Representations from Transformer (ALBERT) dynamic word vector algorithm, Bidirectional Gating Recurrent Unit (BiGRU) algorithm, the Inverted Multi-Index with Weighted Naive Bayes (IMI-WNB) algorithm and clustering analysis algorithm.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have