A Novel Machine Learning Methodology for Detecting Phishing Attacks in Real Time

Vishal Arora,Manoj Misra

doi:10.1007/978-3-030-59817-4_3

Abstract

Phishing is a cybercriminal activity where the criminal masquerades as a trusted entity and attacks the righteous users to gain personal information illegally. Many phishing detection techniques have been proposed in the past which use blacklist/whitelist, heuristic, search engine, visual similarity and machine learning. The statistics say that the average lifespan of any phishing website is 8–10 h which makes it strenuous for most of the above-mentioned techniques to identify and detect it accurately. Blacklist/whitelist and Search Engine based techniques work in real time but may fail to handle zero day phishing attacks. To tackle this problem, it is essential to have an approach that studies the dynamic behavior of the websites and predicts the new phishing website accurately. Machine Learning has been used in the past to handle dynamic behavior of phishing websites. In this paper, we propose a method in which a browser extension makes an API call to the pre-trained machine learning model to fetch the results, thus making machine learning work in real-time. Six machine learning classifiers have been rigorously trained and tested on a dataset of 5430 legitimate URLs and 5147 phished URLs. We have used a novel feature in which HTTPS URLs can be accurately identified as phished or legitimate based on Certificate validation. This method also detects the phishing websites hidden behind the short URLs along with the normal URLs, thus making it more robust. This methodology has a quick response time of 1.74 s along with an accuracy of 99.93% which is better than the previous works.

Full Text