Abstract

Email is a popular method for communicating with each other. However, as sending email is free of charge as long as an email server and a domain name are available, spam mail is becoming a critical problem in the email network. Conventionally, the industry uses spam filters based on rules and Bayesian inference to counteract spam mail, reaching an accuracy of 98.76%, which is far from satisfactory. Hence, to better protect email users from unsolicited messages containing advertisements, sensitive content, phishing content, and viruses, a new approach is proposed, in which email content is filtered by a spam detector using bidirectional encoder representations from transformers (BERT). BERT is a new language representation model published by Google that has achieved great success because of its powerful capabilities in understanding natural language. After the model is trained on a corpus from Kaggle, the spam detector equipped with the BERT model reaches a binary accuracy of 99.40% when classifying spam mail.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.