Abstract

Spam e-mails are unsolicited e-mails received by users of the e-mail service. Spam e-mails cause serious harm to organizations, for they waste, among other things, their computational and networking resources. To reduce the damage caused by them, organizations use anti-spams. Anti-spams are software systems that classify e-mails in order to separate legitimate from spam e-mails. The best current commercial and open-source anti-spams, and in particular the well-known commercial anti-spam CanIt-PRO, make use of various techniques, such as blacklists and/or SMTP extensions, to classify e-mails. Unfortunately, both blacklists and SMTP extensions have serious drawbacks, such as low scalability and high computational and network costs. This paper introduces the Open Machine-Learning-Based Anti-Spam (Open-MaLBAS). Unlike the best current anti-spams, Open-MaLBAS does not make use of blacklists and SMTP extensions, but only of machine learning models for e-mail classification. Open-MaLBAS was compared to CanIt-PRO in a series of experiments on a database composed of 862,227 real e-mails, collected over three months at the Federal University of Itajuba, Brazil. The e-mails were previously classified by CanIt-PRO. From the experiments, it was observed that Open-MaLBAS was able to correctly classify 81.48% and 98.13% of the e-mails in the database, using, respectively, the two models — Multi-Layer Perceptron and Random Forest — evaluated. In addition, it managed to obtain times of up to 88% shorter than those of CanIt-PRO to classify all e-mails in the database. Open-MaLBAS is implemented in Java language, under free software license, for free use. It is available on GitHub.

Highlights

  • An anti-spam (AS) is a software system that classifies e-mails in order to separate legitimate from spam e-mails

  • The second function, performed only when Open-MaLBAS operates in Running Mode, is to classify, in the ham and spam classes, new e-mails, represented by vectors, which are sent by Postfix-Active Blacklist (ABL) Module

  • In terms of accuracy in the classification of e-mails, the Random Forest (RF) model produced better results than those produced by the Multi-Layer Perceptron (MLP) model

Read more

Summary

INTRODUCTION

An anti-spam (AS) is a software system that classifies e-mails in order to separate legitimate from spam e-mails. The best current commercial and open-source anti-spams (ASes) make use of address lists — blacklists [2], greylists [3], whitelists [2] — on the Internet for e-mail classification. Unlike the best current commercial and open-source ASes, and in particular the wellknown commercial AS CanIt-PRO [14], Open-MaLBAS does not make use of blacklists on the Internet and of SMTP extensions, but only of machine learning (ML) models for e-mail classification. It thoroughly assesses Open-MaLBAS on a large database of real e-mails and compares its results with those obtained by CanIt-PRO It shows that OpenMaLBAS may be both as much efficient in terms of e-mail classification as, and more efficient in terms of the time required for classification than the best current commercial and open-source ASes. The paper is divided into sections as follows. The stored spam e-mails are integrated into the set of e-mails used in the periodic training of ML models

PRE-PROCESSING MODULE
DATA REPRESENTATION AND PROCESSING
Method NF NF
THIRD EXPERIMENT
Findings
VIII. CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.