Abstract

Spam emails have become a major problem in internet communication and can cause potentially serious adverse effects on the recipients if unidentified. Many spam filters have been developed to filter out certain spam emails, but as spammers continuously improve their spamming techniques, the exiting filters may become less effective. This paper presents a heterogeneous ensemble approach that combines several methodologically different filters to work collectively to improve accuracy and reliability in identifying spam emails. A special procedure for building heterogeneous and homogeneous ensembles with Bayesian filter as base learner has been devised and a framework has been designed and implemented. After verifying the framework intensively with 10 other benchmark data sets, it was applied to identify spam emails. The experiments with a spam benchmark corpus indicated that the heterogeneous ensembles achieved more accurate and reliable classifications than the individual and other ensemble filters.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.