Spam Email Filtering Leveraging Improved Text Convolutional Neural Network

Dongjie Chen

doi:10.54254/2755-2721/8/20230221

Abstract

This paper outlines the development of anti-spam technology in the last two decades and introduces a novel approach using a convolution neural network (CNN) to tackle the problem of spam filtering. The study uses the TREC06c dataset, containing Chinese spam email data, the dataset is split between a training set and a test set. The paper also introduces the concept of word embeddings, which converts each word in the text into a real-valued vector, better reflecting the semantic relationships between words. The TEXT-CNN algorithm is then discussed, which applies convolutional neural networks to text data and is generated by modifying the TEXT-CNN model to improve its performance in the spam filter. The conclusion of this article is that TEXT-CNN model demonstrates great results in the task of identifying spam emails, and the classification efficiency can be further improved by introducing an attention mechanism and batch processing mechanism by improving the model. The article also provides some ideas for further improvement.

Full Text