TCURL: Exploring hybrid transformer and convolutional neural network on phishing URL detection

Chenguang Wang,Yuanyuan Chen

doi:10.1016/j.knosys.2022.109955

Abstract

Phishing is a growing threat that involves cybercriminals creating counterfeit websites to lure victims and obtain their sensitive information, such as login credentials and credit card numbers. According to the Q4 2021 Phishing Trends Report by the Anti-Phishing Working Group, the number of phishing attacks has tripled from early 2020. Conventional blacklist method cannot protect users from attacks using new phishing URLs. Traditional machine learning methods require complex feature engineering and generally cannot meet the detection accuracy requirements. Deep learning methods based on fully convolutional networks and pure transformers only pay attention to local correlations or long-term dependencies. To address these issues, we propose a hybrid network architecture, called TCURL, which considers both local and global correlations among the characters of URLs. TCURL has two parallel branches, a convolution branch and a transformer branch, and a fusion block used to deal with messages from the two branches. The convolution branch provides sufficient positional information meaning that no extra positional encoding is needed. Through experiments, we explored various design choices to optimize the model. The proposed method achieves an accuracy of 96.92%, 99.77%, and 89.73% on three sampled datasets, which further outperforms other existing methods.

Full Text