HDP-CNN: Highway deep pyramid convolution neural network combining word-level and character-level representations for phishing website detection

Faan Zheng,Qiao Yan,Victor C.M Leung,F Richard Yu,Zhong Ming

doi:10.1016/j.cose.2021.102584

Abstract

Phishing has become a prevailing method for attackers to steal users’ private data and commit fraud, posing a serious threat to Internet users. How to detect phishing websites has attracted great interests from both academia and industry. A popular approach is to use support vector machine (SVM) to detect phishing websites. However, this approach relies on extracting features designated by experts, and the prediction effectiveness of the model is greatly affected by the quality of feature extraction. In addition, it cannot handle features that are not identifiable. Deep learning methods therefore become popular as they do not require manual feature engineering. However, many deep learning methods can only learn feature information of uniform resource locators (URLs) at the character level, while ignoring the intrinsic connections of words. To address these limitations, we propose a novel highway deep pyramid convolution neural network (HDP-CNN), a deep convolutional network that combines character-level and word-level representation information. HDP-CNN first receives the URL string sequences as input, then performs character-level embedding and word-level embedding respectively. Afterward, it uses the Highway network to connect the character-level embedding representation and word-level embedding representation of the URL and extracts local features of different sizes from the region embedding layer. Finally, it passes them into the designed deep pyramid structure network to capture the global representation of the URL. Our experiments illustrate that the information expressed by embedding vectors of different granularities has subtle differences. By combining embedding feature information of different granularities, HDP-CNN exhibits better performance than methods based on single embedding feature information. In our experiments, we construct an imbalanced dataset that has the ratio of benign websites to phishing websites is close to 5:1. The experimental results demonstrate that our method outperforms other methods, with accuracy at 98.30%, true positive rate (TPR) at 99.18%, and true negative rate (TNR) at 94.34%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

HDP-CNN: Highway deep pyramid convolution neural network combining word-level and character-level representations for phishing website detection

Abstract

Talk to us

Similar Papers

More From: Computers & Security

Lead the way for us

Journal: Computers & Security	Publication Date: Dec 22, 2021
Citations: 14

Similar Papers

Intelligent Deep Machine Learning Cyber Phishing URL Detection Based on BERT Features Extraction
Muna Elsadig ... Nihal Alharbi
Electronics | VOL. 11
Muna Elsadig, et. al.Muna Elsadig ... Nihal Alharbi
08 Nov 2022
Electronics | VOL. 11

CNN–MHSA: A Convolutional Neural Network and multi-head self-attention combined approach for detecting phishing websites
Xi Xiao ... Shutao Xia
Neural Networks | VOL. 125
Xi Xiao, et. al.Xi Xiao ... Shutao Xia
29 Feb 2020
Neural Networks | VOL. 125

A hybrid deep learning technique for spoofing website URL detection in real-time applications
Bridget C Ujah-Ogbuagu ... Emeka Ogbuju
Journal of Electrical Systems and Information Technology | VOL. 11
Bridget C Ujah-Ogbuagu, et. al.Bridget C Ujah-Ogbuagu ... Emeka Ogbuju
24 Jan 2024
Journal of Electrical Systems and Information Technology | VOL. 11

Detecting phishing websites through improving convolutional neural networks with Self-Attention mechanism
Yahia Said ... Tawfeeq Shawly
Ain Shams Engineering Journal | VOL. 15
Yahia Said, et. al.Yahia Said ... Tawfeeq Shawly
22 Jan 2024
Ain Shams Engineering Journal | VOL. 15

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

HDP-CNN: Highway deep pyramid convolution neural network combining word-level and character-level representations for phishing website detection

Abstract

Talk to us

Similar Papers

More From: Computers &amp; Security

More From: Computers & Security