Phishing Email Detection Using Improved RCNN Model With Multilevel Vectors and Attention Mechanism

Yong Fang,Yue Yang,Liang Liu,Cheng Zhang,Cheng Huang

doi:10.1109/access.2019.2913705

Yong Fang, Yue Yang + Show 3 more

Open Access

https://doi.org/10.1109/access.2019.2913705

Copy DOI

Abstract

The phishing email is one of the significant threats in the world today and has caused tremendous financial losses. Although the methods of confrontation are continually being updated, the results of those methods are not very satisfactory at present. Moreover, phishing emails are growing at an alarming rate in recent years. Therefore, more effective phishing detection technology is needed to curb the threat of phishing emails. In this paper, we first analyzed the email structure. Then, based on an improved recurrent convolutional neural networks (RCNN) model with multilevel vectors and attention mechanism, we proposed a new phishing email detection model named THEMIS, which is used to model emails at the email header, the email body, the character level, and the word level simultaneously. To evaluate the effectiveness of THEMIS, we use an unbalanced dataset that has realistic ratios of phishing and legitimate emails. The experimental results show that the overall accuracy of THEMIS reaches 99.848%. Meanwhile, the false positive rate (FPR) is 0.043%. High accuracy and low FPR ensure that the filter can identify phishing emails with high probability and filter out legitimate emails as little as possible. This promising result is superior to the existing detection methods and verifies the effectiveness of THEMIS in detecting phishing emails.

Full Text