Accuracy Comparison of Different Machine Learning Models in Phishing Detection

Anthony Chandra,Anderies Anderies,Gregorius Gregorius,Alexander Agung Santoso Gunawan,M S John Immanuel

doi:10.1109/icoiact55506.2022.9972107

Abstract

The constantly evolving phishing attacks have forced scientists to counter such an attack. Most phishing attacks come with dangerous links or URLs that can potentially lead to data or information leaking to third parties. Machine Learning has been proven great in many cases of both known and unknown types of analysis. Its ability to both give reasoning and “self-taught” in addition with easier duplication makes it great to counter phishing attacks. This paper compares different machine learning algorithms to detect whether a URL is a legitimate URL or a phishing URL with a certain feature using a Web page phishing detection dataset. The machine learning algorithms that were compared are Naive Bayes, K-Nearest Neighbor, Random Forest, Decision Tree, Support Vector Machine, and Logistic Regression. The models were trained using a phishing dataset that has been passed through a preprocessing and encoding layer. The model resulting accuracy, along with other evaluation metrics are recorded and compared to each other. The results are pretty close with the highest accuracy claimed by the Random Forest algorithm which is 98,04% out of 11429 URLs.

Full Text