Phishing Detection with Popular Search Engines: Simple and Effective

Jun Ho Huh,Hyoungshick Kim

doi:10.1007/978-3-642-27901-0_15

Abstract

AbstractWe propose a new phishing detection heuristic based on the search results returned from popular web search engines such as Google, Bing and Yahoo. The full URL of a website a user intends to access is used as the search string, and the number of results returned and ranking of the website are used for classification. Most of the time, legitimate websites get back large number of results and are ranked first, whereas phishing websites get back no result and/or are not ranked at all.To demonstrate the effectiveness of our approach, we experimented with four well-known classification algorithms – Linear Discriminant Analysis, Naïve Bayesian, K-Nearest Neighbour, and Support Vector Machine – and observed their performance. The K-Nearest Neighbour algorithm performed best, achieving true positive rate of 98% and false positive and false negative rates of 2%. We used new legitimate websites and phishing websites as our dataset to show that our approach works well even on newly launched websites/webpages – such websites are often misclassified in existing blacklisting and whitelisting approaches.KeywordsPhishing detectionURL ReputationClassification

Full Text