Exploring Efficiency of GAN-based Generated URLs for Phishing URL Detection

Tuan Dung Pham,Sy Tuong Hoang,Thi Thanh Thuy Pham,Viet Cuong Ta

doi:10.1109/mapr53640.2021.9585287

Abstract

The URL (Uniform Resource Locator) is used to refer to the resources on the Internet by giving hyperlinks to the websites. Different resources are referenced by different network addresses or different URLs. As a result, embedding malware on websites by using malicious URLs is one of the most dangerous types of cyberattacks today and poses a serious threats to the safety of systems. In order to detect the phishing URLs, the most commonly used approach recently is using deep learning networks with a large number of URL samples, including both malign and benign ones for training the deep networks. However, the available URL databases have a modest number of samples. In addition, the disadvantage of these databases is the imbalance distribution of malicious and non-malicious URL strings. In fact, it is difficult to collect or update malicious URLs because these URLs only exist for a short time, after being detected they are changed again and again. In order to solve this challenge, in this work, we propose to train a GAN network named WGAN-GP for generating malicious URLs from the available phishing URL data. We then integrate the generated phishing URL data into the existing URL database and perform two URL classifiers of LSTM and GRU to give the comparative results. The experiments on different quantities of URL samples show the improvement for URL classification by using WGAN-GP and LSTM classifier.

Full Text