Deep Character-Level Anomaly Detection Based on a Convolutional Autoencoder for Zero-Day Phishing URL Detection

Seok-Jun Bu,Sung-Bae Cho

doi:10.3390/electronics10121492

Abstract

Considering the fatality of phishing attacks, the data-driven approach using massive URL observations has been verified, especially in the field of cyber security. On the other hand, the supervised learning approach relying on known attacks has limitations in terms of robustness against zero-day phishing attacks. Moreover, it is known that it is critical for the phishing detection task to fully exploit the sequential features from the URL characters. Taken together, to ensure both sustainability and intelligibility, we propose the combination of a convolution operation to model the character-level URL features and a deep convolutional autoencoder (CAE) to consider the nature of zero-day attacks. Extensive experiments on three real-world datasets consisting of 222,541 URLs showed the highest performance among the latest deep-learning methods. We demonstrated the superiority of the proposed method by receiver-operating characteristic (ROC) curve analysis in addition to 10-fold cross-validation and confirmed that the sensitivity improved by 3.98% compared to the latest deep model.

Highlights

A phishing attack in its broadest sense can be defined as a scalable act of deception whereby impersonation is used by an attacker to obtain information from an individual [1]
The convolution operation aims to learn a spatial filter to extract features in the local receptive field that shares weights [7], and the long short-term memory (LSTM), a variant of an recurrent neural network (RNN), is a memory cell that stores the weights used for mapping between inputs and outputs [8]
We propose a combination of a convolution operation to model the character-level URL features and a deep autoencoder (AE) to consider the nature of zeroday attacks

Summary

Introduction

A phishing attack in its broadest sense can be defined as a scalable act of deception whereby impersonation is used by an attacker to obtain information from an individual [1]. Among the most prominent methods, the combination of a convolutional neural network (CNN) and a recurrent neural network (RNN) has been found to significantly improve the detection performance by explicitly modeling the character- and word-level features of phishing attacks [5]. The convolution operation aims to learn a spatial filter to extract features in the local receptive field that shares weights [7], and the long short-term memory (LSTM), a variant of an RNN, is a memory cell that stores the weights used for mapping between inputs and outputs [8]. We propose a combination of a convolution operation to model the character-level URL features and a deep autoencoder (AE) to consider the nature of zeroday attacks. In order to demonstrate the superiority of the proposed method, we performed receiver-operating characteristic (ROC) curve analysis in addition to 10-fold cross-validation and confirmed that the accuracy improved

Method

Character-Level URL Model Based on a Convolutional Autoencoder

Phishing URL Classification Based on Reconstruction Errors

Experimental Results

Dataset and Implementation

Phishing Detection Performance

Performance Evaluation by Component

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Electronics	Publication Date: Jun 21, 2021
Citations: 27	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Deep Character-Level Anomaly Detection Based on a Convolutional Autoencoder for Zero-Day Phishing URL Detection

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics

Lead the way for us

Similar Papers

МОДЕЛЬ ПІДГОТОВКИ ФАХІВЦІВ У СФЕРІ ІНФОРМАЦІЙНОЇ ТА КІБЕРНЕТИЧНОЇ БЕЗПЕКИ В ЗАКЛАДАХ ВИЩОЇ ОСВІТИ УКРАЇНИ
Volodymyr L Buriachok ... Yurii V Borsukovskii
Information Technologies and Learning Tools | VOL. 67
Volodymyr L Buriachok, et. al.Volodymyr L Buriachok ... Yurii V Borsukovskii
30 Oct 2018
Information Technologies and Learning Tools | VOL. 67

FEATURES OF MODERN CONCEPTUAL AND TERMINOLOGICAL APPARATUS IN THE FIELD OF TRAINING OF CYBER SECURITY SPECIALISTS
Serhiі Horlichenko
Cybersecurity: Education, Science, Technique | VOL. 3
Serhiі HorlichenkoSerhiі Horlichenko
01 Jan 2024
Cybersecurity: Education, Science, Technique | VOL. 3

ПРАВОВИЙ АНАЛІЗ СУЧАСНОГО СТАНУ ТА ТЕНДЕНЦІЙ РОЗВИТКУ ЗАКОНОДАВСТВА ЄС ТА УКРАЇНИ У СФЕРІ КІБЕРБЕЗПЕКИ
Daryna Kosinova ... Oleksandr Cherniavskyi
International scientific journal "Internauka". Series: "Juridical Sciences" | VOL. -
Daryna Kosinova, et. al.Daryna Kosinova ... Oleksandr Cherniavskyi
01 Jan 2017
International scientific journal "Internauka". Series: "Juridical Sciences" | VOL. -

The Conduct and Reporting of Meta-Analyses of Studies of Diagnostic Tests, and a Consideration of ROC Curves: Answers to the January 2010 Journal Club Questions
Teri A Reynolds ... David L Schriger
Annals of Emergency Medicine | VOL. 55
Teri A Reynolds, et. al.Teri A Reynolds ... David L Schriger
21 May 2010
Annals of Emergency Medicine | VOL. 55

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Deep Character-Level Anomaly Detection Based on a Convolutional Autoencoder for Zero-Day Phishing URL Detection

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics