Deep Neural Network with Stacked Denoise Auto Encoder for Phishing Detection

Sumathi Kothandan,Vijayan Sujatha

doi:10.30991/ijmlnce.2019v03i02.005

Abstract

Sensitive information such as credit card information, username, password and social security number etc, can be stolen using a fake page that imitates trusted website is called phishing. The attacker designs a similar webpage either by copying or making small manipulation to the legitimate page so that the online user cannot distinguish the legitimate and fake websites. A Deep Neural Network (DNN) was introduced to detect the phishing Uniform Resource Locator (URL). Initially, a 30-dimension feature vector was constructed based on URL-based features, Hypertext Markup Language (HTML)-based features and domain-based features. These features were processed in DNN to detect the phishing URL. However, the irrelevant, redundant and noisy features in the dataset increase the complexity of DNN classifier. So the feature selection is required for efficient phishing attack detection. But feature selection is a time-consuming process since it is an independent process. So in this paper, a feature vector is generated by DNN itself using Stacked Denoise Auto Encoder (SDAE). Moreover, the noisy data such as missing features affect the efficiency of phishing detection so the SDAE is trained to reconstruct a clean input feature vector. The initial input feature vector is corrupted by setting some feature vectors as zero. Then the corrupted feature vector is then mapped with basic auto encoder, to a hidden representation from which the input feature vector is reconstructed. The reconstructed features are given as input to DNN which selects the most relevant features and predicts the phishing URL. Hence the sparse feature representation of SDAE increases the classification accuracy of DNN. The experiments are conducted in Ham, Phishing Corpus and Phishload datasets in terms of accuracy, precision, recall and F-measure to prove the effectiveness of DNN-SDAE.

Full Text