Abstract

Unsolicited emails, known as spam, are one of the fast growing and costly problems associated with the Internet today. Among the many proposed solutions, a technique using Bayesian filtering is considered as the most effective weapon against spam. Bayesian filtering works by evaluating the probability of different words appearing in legitimate and spam mails and then classifying them based on that probabilities.Most of the current spam email detection systems use keywords to detect spam emails.These keywords can be written as misspellings eg: baank or bannk instead of bank. Misspellings are changed from time to time and hence spam email detection system needs to constantly update the blacklist to detect spam emails containing misspellings. Its impossible to predict all possible misspellings for a given keyword and add those to the blacklist. In this paper a better and more successful approach for improving E-mail content classification for spam control is proposed. It used the Word Stemming or Word Hashing Technique for improving the efficiency of the content based spam filter.The proposed system extract the base or stem of a misspelled or modified word, to detect spam emails. It considers every misspelled keyword applies a word stemming technique and passes the base word to the content based filter. Using a proposed if-then rule, we can decide whether or not this unknown mail is spam (1).This paper also provides an Email archiving solution which classifies the E-mail relating to a person, family, corporation, association, community, or nation. KeywordsFilters, Bayesian, content based spam filter, Word Stemming, Email, Email archiving.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.