Deep learning based phishing website identification system using CNN-LSTM classifier

Vinod Sapkal,Aboo Bakar Khan,Praveen Gupta

doi:10.47974/jios-1343

Abstract

The term phishing refers to an attack that pretends to be the website of a large corporation, typically one dealing with money, such as a bank or other financial institution or an online retailer. Its primary objective is to acquire personally identifiable information from users, such as their social security numbers, credit card information, and passwords. Due to the rise of phishing attacks, various techniques have been developed in order to combat these threats. One of these is deep learning algorithms, which are capable of learning and analyzing massive datasets. Due to their capabilities, these algorithms are very useful in identifying and preventing phishing attacks. Due to the complexity of the phishing websites, many development systems have been created to detect them. Unfortunately, the output that was desired cannot be achieved by these systems, and they have a number of other flaws as well. The purpose of this paper is to propose a hybrid deep learning-based phishing detection system that is easy to put into practice. The quality of the input dataset is improved through the process of preprocessing the dataset. After that, the procedures of clustering and feature selection are carried out in order to improve the accuracy and decrease the amount of time required for the processing. The resulting features are then fed into the CNN_LSTM, which is a classification system that classifies websites that are phishing and legitimate. Proposed Hybrid deep learning models are proposed to combine the features of natural language processing (NLP) and character embedding. They can then reveal high-level connections between characters. In terms of the metric that is being used for the evaluation, the performance of the models that have been proposed is better than that of the other models.

Full Text