Abstract
In this paper, we propose the use of Ensemble Machine Learning Methods such as Random Forest Algorithm and Extreme Gradient Boosting (XGBOOST) Algorithm for efficient and accurate phishing website detection based on its Uniform Resource Locator. Phishing is one of the most widely executed cybercrimes in the modern digital sphere where an attacker imitates an existing - and often trusted - person or entity in an attempt to capture a victim’s login credentials, account information, and other sensitive data. Phishing websites are visually and semantically similar to real ones. The rise in online trading activities has resulted in a rise in the number of phishing scams. Cybersecurity jobs are the most difficult to fill, and the development of an automated system for phishing website detection is the need of the hour. Machine Learning is one of the most feasible methods to approach this situation, as it is capable of handling the dynamic nature of phishing techniques, in addition to providing an accurate method of classification.
Highlights
The internet has seen rapid growth in the last decade
According to Verizon’s 2021 Data Breach Investigations Report (DBIR) [1], 3,841 phishing incidents were reported till May 2021, wherein data disclosure was confirmed for at least 50% of the cases
A Uniform Resource Locator (URL) is a unique identifier that locates a resource on the internet
Summary
The internet has seen rapid growth in the last decade. With the internet connecting billions of people globally, it is critically important to acknowledge the fact that the safety and privacy of internet users are not optimal. The rate of cyber-crime is on the increase and leads to great financial losses each year. Phishing attacks account for more than 80% of reported security incidents. According to Verizon’s 2021 Data Breach Investigations Report (DBIR) [1], 3,841 phishing incidents were reported till May 2021, wherein data disclosure was confirmed for at least 50% of the cases. Number of breaches involving phishing shows an 11% increase in 2021 as compared to the previous year. 95% of these attacks were financially motivated causing a loss of thousands of dollars per minute. A Uniform Resource Locator (URL) is a unique identifier that locates a resource on the internet. It carries various parts such as protocol, domain name, port, path, query, etc. Machine Learning algorithms prove to be an accurate and efficient method in recognizing these features, and in predicting whether a given website is a phishing website or a safe one
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have