Abstract

Detecting phishing webpages is an essential task that protects legitimate websites and their users from various malicious activities. To classify the suspect webpage as phishing or legitimate, robust and effective features used for classification are in demand. However, recent phishing attacks usually make phishing webpages resemble the legitimate webpages in visual and functional aspects. This poses a greater difficulty for feature extraction. We herein propose SPWalk , an unsupervised feature learning algorithm for phishing detection. In SPWalk , similar property nodes refer to a collection of phishing webpages or legitimate webpages. We first construct a weblink network with nodes representing webpages. The edges between nodes represent the reference relationships that connect webpages through hyperlinks or similar textual content. Then, SPWalk applies the network embedding technique to mapping nodes into a low-dimensional vector space. A biased random walk procedure efficiently integrates both structural information between nodes and URL information of each node. The effectiveness and robustness of SPWalk come from three points. (1). Phishing attackers do not have full control over reference relationships . (2). The structural regularities generated by diverse reference relationships can be exploited to discriminate between phishing and legitimate webpages. (3). Node URL information makes the learned node representations more suited for phishing detection. Using node as numeric features, we conduct experiments to classify webpages as legitimate or phishing. We demonstrate the superiority of SPWalk over state-of-the-art techniques on phishing detection, especially in terms of precision (over 95%). Even in the case that phishing webpages are well camouflaged by attackers for evading detection, SPwalk exhibits better classification efficacy consistently.

Highlights

  • Phishing is a concrete, widespread threat that combines social engineering with website spoofing

  • We show how SPWalk is in accordance with phishing detection principles, learning node representations conforming to similar property

  • SPWALK MODEL Having provided URL quality score of each node in a weblink network, can our model learn node representations conforming to similar property to improve phishing detection performance? To answer this, we present a novel feature learning model called ‘‘SPWalk’’, which implements similar property oriented feature learning

Read more

Summary

Introduction

Widespread threat that combines social engineering with website spoofing It leads to various malicious activities, including identity theft, financial gain, unauthorized account access, credit card fraud, etc. This threat causes tremendous financial losses to Internet users, and long term reputation damage to the legitimate websites targeted by phishing scams. Through manipulating textual or graphical form, the attackers make the newly-created phishing website look similar to the legitimate one. Another implementation form is to exploit the vulnerabilities in publicly-available websites to compromise legitimate websites. The automatically created phishing webpages share identical domain names and similar appearances with other (legitimate) webpages within the same compromised website

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.