The Development of the Open Machine-Learning-Based Anti-Spam (Open-MaLBAS)

Isaac C. Ferreira,Otavio A. S. Carpinteiro,Edmilson M. Moreira,Edvard M. Oliveira,Bruno T. Kuehne,Marcelo V. C. Aragao

doi:10.1109/access.2021.3118901

Isaac C. Ferreira, Otavio A. S. Carpinteiro + Show 4 more

Open Access

https://doi.org/10.1109/access.2021.3118901

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 3	License type: CC BY 4.0

Affiliation: Federal University of Itajubá

Abstract

Spam e-mails are unsolicited e-mails received by users of the e-mail service. Spam e-mails cause serious harm to organizations, for they waste, among other things, their computational and networking resources. To reduce the damage caused by them, organizations use anti-spams. Anti-spams are software systems that classify e-mails in order to separate legitimate from spam e-mails. The best current commercial and open-source anti-spams, and in particular the well-known commercial anti-spam CanIt-PRO, make use of various techniques, such as blacklists and/or SMTP extensions, to classify e-mails. Unfortunately, both blacklists and SMTP extensions have serious drawbacks, such as low scalability and high computational and network costs. This paper introduces the Open Machine-Learning-Based Anti-Spam (Open-MaLBAS). Unlike the best current anti-spams, Open-MaLBAS does not make use of blacklists and SMTP extensions, but only of machine learning models for e-mail classification. Open-MaLBAS was compared to CanIt-PRO in a series of experiments on a database composed of 862,227 real e-mails, collected over three months at the Federal University of Itajuba, Brazil. The e-mails were previously classified by CanIt-PRO. From the experiments, it was observed that Open-MaLBAS was able to correctly classify 81.48% and 98.13% of the e-mails in the database, using, respectively, the two models — Multi-Layer Perceptron and Random Forest — evaluated. In addition, it managed to obtain times of up to 88% shorter than those of CanIt-PRO to classify all e-mails in the database. Open-MaLBAS is implemented in Java language, under free software license, for free use. It is available on GitHub.

Highlights

An anti-spam (AS) is a software system that classifies e-mails in order to separate legitimate from spam e-mails
The second function, performed only when Open-MaLBAS operates in Running Mode, is to classify, in the ham and spam classes, new e-mails, represented by vectors, which are sent by Postfix-Active Blacklist (ABL) Module
In terms of accuracy in the classification of e-mails, the Random Forest (RF) model produced better results than those produced by the Multi-Layer Perceptron (MLP) model

Summary

INTRODUCTION

An anti-spam (AS) is a software system that classifies e-mails in order to separate legitimate from spam e-mails. The best current commercial and open-source anti-spams (ASes) make use of address lists — blacklists [2], greylists [3], whitelists [2] — on the Internet for e-mail classification. Unlike the best current commercial and open-source ASes, and in particular the wellknown commercial AS CanIt-PRO [14], Open-MaLBAS does not make use of blacklists on the Internet and of SMTP extensions, but only of machine learning (ML) models for e-mail classification. It thoroughly assesses Open-MaLBAS on a large database of real e-mails and compares its results with those obtained by CanIt-PRO It shows that OpenMaLBAS may be both as much efficient in terms of e-mail classification as, and more efficient in terms of the time required for classification than the best current commercial and open-source ASes. The paper is divided into sections as follows. The stored spam e-mails are integrated into the set of e-mails used in the periodic training of ML models

PRE-PROCESSING MODULE

DATA REPRESENTATION AND PROCESSING

Method NF NF

THIRD EXPERIMENT

Findings

VIII. CONCLUSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

The Development of the Open Machine-Learning-Based Anti-Spam (Open-MaLBAS)

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Machine Learning Based Classification for Spam Detection
Serkan Keskin ... Onur Sevli
Sakarya University Journal of Science | VOL. 28
Serkan Keskin, et. al.Serkan Keskin ... Onur Sevli
30 Apr 2024
Sakarya University Journal of Science | VOL. 28

Development of Proposed Ensemble Model for Spam e-mail Classification
Akhilesh Kumar Shrivas ... S M Ghosh
Information Technology and Control | VOL. 50
Akhilesh Kumar Shrivas, et. al.Akhilesh Kumar Shrivas ... S M Ghosh
24 Sep 2021
Information Technology and Control | VOL. 50

Random Forests Spam Email Classification System
Khongbantabam Susila Devi
Journal of Computer Engineering & Information Technology | VOL. 07
Khongbantabam Susila DeviKhongbantabam Susila Devi
01 Jan 2018
Journal of Computer Engineering & Information Technology | VOL. 07

Detection of Spam Email
Manish Panwar ... Jayesh Rajesh Jogi
American Journal of Innovation in Science and Engineering | VOL. 1
Manish Panwar, et. al.Manish Panwar ... Jayesh Rajesh Jogi
30 Dec 2022
American Journal of Innovation in Science and Engineering | VOL. 1

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The Development of the Open Machine-Learning-Based Anti-Spam (Open-MaLBAS)

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access