Graph-based phishing detection: URLGBM model driven by machine learning

Abdallah Madani,Abdelali Elkouay,Najem Moussa

doi:10.1080/1206212x.2024.2342710

Abstract

Phishing attacks are a form of social engineering that involves the transmission of deceptive communications imitating a trustworthy source to deceive users into revealing confidential information. Antiphishing systems have made significant strides in recent years, allowing internet users to secure confidential and private information against such attacks. In this study, we propose a URL graph representation based on a random walk algorithm, specifically PageRank, for weighting URL tokens. To create the graph, an imaginary walker visits the URL tokens one at a time and assigns a value to each token based on the probability of encountering the target URL during the walk. We studied different random walk (rw) variations and their effects on the URL string. The BM25 algorithm was employed to produce a sparse matrix for the classification task from the token scores obtained. Experiments conducted with logistic regression revealed that the proposed model achieved an accuracy of 98.98%, a false alarm rate of 1.72%, and a missing alarm rate of 0.302%. The model also attained a 97.17% accuracy on a benchmark dataset.

Full Text