Abstract

Spam is unsolicited bulk messages sent indiscriminately. According to Wikipedia and Cisco report, more than 31 trillion spams have been sent in 2009. These spam or “junk mails” can involve various kinds of messages such as commercial advertising, pornography, viruses, doubtful product, get rich quick scheme or quasi legal services. In this paper, a direct attention has been paid to the text spam, and in particular, the process of text spam and the tricks of the spammers have been described in this paper. Moreover, the author described the implementation of the text content analysis and classification, using different document processing techniques (that is, stop words, short words form, regular expression, stemming etc.) and naive Bayesian classifier. In addition to that, the author has depicted the practical work of the document processing and naive Bayesian classifier towards implementing an accurate anti-spam system. Key words: Text spam, stop words, short words form, regular expression, stemming, document processing, naive Bayesian classifier.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call