Detecting phishing websites through improving convolutional neural networks with Self-Attention mechanism

Yahia Said,Ahmed A Alsheikhy,Husam Lahza,Tawfeeq Shawly

doi:10.1016/j.asej.2024.102643

Abstract

Emerging technologies have made internet connection a vital activity facilitating access to many services. However, internet connection raises many security concerns, such as illegally acquiring private information, passwords, and identifiers. Phishing websites are the first choice for attackers that try to have users' private space. Social engineering attacks are performed by designing fake websites similar to real ones and inviting the victim to access those websites to collect their sensitive information and then redirect them to the actual site. Due to the importance of detecting phishing websites, building a robust detector that filters them and blocks their activity on the Internet is necessary. In this paper, we proposed a phishing website detector based on improving the convolutional neural network (CNN) with a self-attention mechanism. The proposed detector collects phishing Uniform Resource Locators (URLs) by treating them as strings. CNN models have proved their efficiency when dealing with text strings compared to Long Short-Term Memory (LSTM) which focuses on temporal features. Using CNN allows learning comprehensive features of the URLs and facilitates the detection of phishing ones. The self-attention mechanism was added to enhance the model's focus and detection accuracy. Besides, the training dataset was balanced by generating phishing URLs using a Generative Adversarial Network (GAN). A set of experiments has proved the robustness of the proposed detector by achieving high detection accuracy on the test set. Besides, the proposed detector was tested using unknown URLs and achieved excellent results. The improved CNN's detection precision of 99.7 is higher than the regular CNN model by 2.74%. The reported results show that using the self-attention mechanism has improved the detection accuracy and made the CNN model more efficient for detecting phishing websites.

Full Text