APLICAÇÃO DE MACHINE LEARNING NA IDENTIFICAÇÃO DE E-MAILS COMO SPAM

Michelle Tais Garcia Furuya,Danielle Elis Garcia Furuya

doi:10.5747/ce.2020.v12.n3.e327

Michelle Tais Garcia Furuya, Danielle Elis Garcia Furuya

Open Access

https://doi.org/10.5747/ce.2020.v12.n3.e327

Copy DOI

Journal: COLLOQUIUM EXACTARUM	Publication Date: Feb 8, 2021
License type: cc-by-nc-nd

Affiliation: Universidade do Oeste Paulista

Abstract

The e-mail service is one of the main tools used today and is an example that technology facilitates the exchange of information. On the other hand, one of the biggest obstacles faced by e-mail services is spam, the name given to the unsolicited message received by a user. The machine learning application has been gaining prominence in recent years as an alternative for efficient identification of spam. In this area, different algorithms can be evaluated to identify which one has the best performance. The aim of the study is to identify the ability of machine learning algorithms to correctly classify e-mails and also to identify which algorithm obtained the greatest accuracy. The database used was taken from the Kaggle platform and the data were processed bythe Orange software with four algorithms: Random Forest (RF), K-Nearest Neighbors (KNN), Support Vector Machine (SVM) and Naive Bayes (NB). The division of data in training and testing considers 80% of the data for training and 20% for testing. The results show that Random Forest was the best performing algorithm with 99% accuracy.

Full Text