Abstract

This paper outlines the development of anti-spam technology in the last two decades and introduces a novel approach using a convolution neural network (CNN) to tackle the problem of spam filtering. The study uses the TREC06c dataset, containing Chinese spam email data, the dataset is split between a training set and a test set. The paper also introduces the concept of word embeddings, which converts each word in the text into a real-valued vector, better reflecting the semantic relationships between words. The TEXT-CNN algorithm is then discussed, which applies convolutional neural networks to text data and is generated by modifying the TEXT-CNN model to improve its performance in the spam filter. The conclusion of this article is that TEXT-CNN model demonstrates great results in the task of identifying spam emails, and the classification efficiency can be further improved by introducing an attention mechanism and batch processing mechanism by improving the model. The article also provides some ideas for further improvement.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call