Abstract

Phishing attacks, which have exponentially increased in recent years, are a form of cyber attack aiming to steal sensitive credentials of innocent users. In general, the attackers attempt to deceive users by creating and submitting a fake but visually similar version of a legitimate web page, which has already been in usage. In this study, we suggest an approach for recognition of phishing web pages by utilizing two global image descriptors namely GIST and local binary patterns (LBP) which have never been employed in phishing web page recognition literature. Moreover, in order to obtain a discriminative representation, we have experimented two kinds of visual feature extraction scheme such as (1) “holistic” and (2) “multi-level patches”. While we have only used whole web page screenshot in “holistic” scheme, screenshots were divided into equally sized smaller crops at growing number of levels during the implementation of “multi-level” patches scheme. In order to evaluate the proposed approach, we have employed a publicly available phishing web page dataset in literature including screenshots of both 14 different highly phished brands and legitimate web pages posing an open-set problem for researchers. Besides, the aforementioned dataset covers 1313 training and 1539 testing cases in total. The visual signatures extracted by use of GIST and LBP descriptors were then fed to various machine learning models such as SVM, Random Forest and XGBoost (regularized gradient tree boosting). According to the results of comprehensively conducted experiments, XGBoost has been found as the best learner. In line with this finding, we obtained 87.7% (GIST) and 83.1% (LBP) validation accuracy along with the representation of “multi-level patches”. Consequently, it has been shown that preferred global image descriptors can be successfully employed for detecting and recognizing phishing web pages. In addition, average required time for processing one screenshot (around 1.12 sec.) with GIST descriptors indicates that the proposed scheme and GIST can be effectively used as a browser based plug-in for recognizing brands of phishing web pages.

Highlights

  • Phishing is a cyber attack aimed at deceiving users in order to share personal information of innocent users such as passwords, user names and ID numbers

  • Since phishing web pages are visually similar to their countparts, vision-based approaches have emerged in order to create effective and efficient classifiers

  • We suggest a phishing detection and brand recognition mechanism by employing two global image descriptors (i.e. GIST and Local Binary Patterns (LBP)) which have been widely used in computer vision

Read more

Summary

Introduction

Phishing is a cyber attack aimed at deceiving users in order to share personal information of innocent users such as passwords, user names and ID numbers. In this kind of attack, web pages visually mimicking to their counrparts are delivered to the users in order to capture their sensitive information. There are many different types of this attack and they are usually classified according to who the target and the attacker are. An attacker uses a legitimate e-mail that has already been sent and copies its content to a similar e-mail with a link to a malicious site. Spear phishing usually targets a specific person or organization. Whaling is a kind of fishing that targets important and wealthy individuals such as CEOs or civil servants [1]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call