Phishing Message Detection Based on Keyword Matching

Keng-Theen Tham,Kok-Why Ng,Su-Cheng Haw

doi:10.18080/jtde.v11n3.776

Keng-Theen Tham, Kok-Why Ng + Show 1 more

Open Access

https://doi.org/10.18080/jtde.v11n3.776

Copy DOI

Abstract

This paper proposes to use the Naïve Bayes-based algorithm for phishing detection, specifically in spam emails. The paper compares probability-based and frequency-based approaches and investigates the impact of imbalanced datasets and the use of stemming as a natural language processing (NLP) technique. Results show that both algorithms perform similarly in spam detection, with the choice between them depending on factors such as efficiency and scalability. Accuracy is influenced by the dataset configuration and stemming. Imbalanced datasets lead to higher accuracy in detecting emails in the majority class, while they struggle to classify minority-class emails. In contrast, balanced datasets yield overall high accuracy for both spam and ham email identification. This study reveals that stemming has a minor impact on algorithm performance, occasionally decreasing in accuracy due to word grouping. Balancing the dataset is crucial for improving algorithm performance and achieving accurate spam email detection. Hence, both probability-based and frequency-based Naïve Bayes algorithms are effective for phishing detection using balanced datasets. The frequency-based approach, with a balanced dataset and stemming, achieves a balanced performance between recall and precision, while the probability-based method with a balanced dataset and no stemming prioritises overall accuracy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Phishing Message Detection Based on Keyword Matching

Abstract

Talk to us

Similar Papers

More From: Journal of Telecommunications and the Digital Economy

Lead the way for us

Journal: Journal of Telecommunications and the Digital Economy	Publication Date: Sep 30, 2023
License type: CC BY-NC-ND 4.0

Similar Papers

Soil textural class modeling using digital soil mapping approaches: Effect of resampling strategies on imbalanced dataset predictions
Fereshteh Mirzaei ... Ruth Kerry
Geoderma Regional | VOL. 38
Fereshteh Mirzaei, et. al.Fereshteh Mirzaei ... Ruth Kerry
15 Jun 2024
Geoderma Regional | VOL. 38

SMS Spam Detection using Machine Learning
Phanirama Prasad ... H Aishwarya
International Journal of Advanced Research in Science, Communication and Technology | VOL. -
Phanirama Prasad, et. al. Phanirama Prasad ... H Aishwarya
23 May 2024
International Journal of Advanced Research in Science, Communication and Technology | VOL. -

Spam Email Detection with Affect Intensities using Recurrent Neural Network Algorithm
Nurafifah Alya Farahisya ... Fitra A Bachtiar
-
Nurafifah Alya Farahisya, et. al.Nurafifah Alya Farahisya ... Fitra A Bachtiar
22 Jan 2022
22 Jan 2022

Highly Accurate Spam Detection with the Help of Feature Selection and Data Transformation
Hidayet Takcı ... Fatema Nusrat
The International Arab Journal of Information Technology | VOL. 20
Hidayet Takcı, et. al.Hidayet Takcı ... Fatema Nusrat
01 Jan 2023
The International Arab Journal of Information Technology | VOL. 20

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Phishing Message Detection Based on Keyword Matching

Abstract

Talk to us

Similar Papers

More From: Journal of Telecommunications and the Digital Economy