Phishing Website Detection Based on Deep Convolutional Neural Network and Random Forest Ensemble Learning.

Rundong Yang,Bin Wu,Chunhua Wu,Kangfeng Zheng,Xiujuan Wang

doi:10.3390/s21248281

Abstract

Phishing has become one of the biggest and most effective cyber threats, causing hundreds of millions of dollars in losses and millions of data breaches every year. Currently, anti-phishing techniques require experts to extract phishing sites features and use third-party services to detect phishing sites. These techniques have some limitations, one of which is that extracting phishing features requires expertise and is time-consuming. Second, the use of third-party services delays the detection of phishing sites. Hence, this paper proposes an integrated phishing website detection method based on convolutional neural networks (CNN) and random forest (RF). The method can predict the legitimacy of URLs without accessing the web content or using third-party services. The proposed technique uses character embedding techniques to convert URLs into fixed-size matrices, extract features at different levels using CNN models, classify multi-level features using multiple RF classifiers, and, finally, output prediction results using a winner-take-all approach. On our dataset, a 99.35% accuracy rate was achieved using the proposed model. An accuracy rate of 99.26% was achieved on the benchmark data, much higher than that of the existing extreme model.

Highlights

On our dataset, a 99.35% accuracy rate was achieved using the proposed model
Different CNN models are evaluated on D1 using CNN1, the proposed convolutional neural network (CNN)
It is clear that the CNN model has the higher accuracy and the better detection performance on both datasets

Summary

Introduction

A 99.35% accuracy rate was achieved using the proposed model. An accuracy rate of. Phishing attacks have become a significant concern owing to an increase in their numbers It is one of the most widely used, effective, and destructive attacks, in which attackers try to trick users into revealing sensitive personal information, such as their passwords and credit card information. A typical phishing attack technique involves using a phishing website, where the attacker lures users to access fake websites by imitating the names and appearances of legitimate websites, such as eBay, Facebook, and Amazon. It is difficult for the average person to distinguish phishing websites from normal websites because phishing websites appear similar to the websites they imitate. According to the Anti-Phishing Working Group (APWG) Q4 2020 report, in 2020, there was an average of 225,759 phishing attacks per month, an increase of 220% compared to

Methods

Results

Conclusion