Abstract

In this paper, we propose the use of Ensemble Machine Learning Methods such as Random Forest Algorithm and Extreme Gradient Boosting (XGBOOST) Algorithm for efficient and accurate phishing website detection based on its Uniform Resource Locator. Phishing is one of the most widely executed cybercrimes in the modern digital sphere where an attacker imitates an existing - and often trusted - person or entity in an attempt to capture a victim’s login credentials, account information, and other sensitive data. Phishing websites are visually and semantically similar to real ones. The rise in online trading activities has resulted in a rise in the number of phishing scams. Cybersecurity jobs are the most difficult to fill, and the development of an automated system for phishing website detection is the need of the hour. Machine Learning is one of the most feasible methods to approach this situation, as it is capable of handling the dynamic nature of phishing techniques, in addition to providing an accurate method of classification.

Highlights

  • The internet has seen rapid growth in the last decade

  • According to Verizon’s 2021 Data Breach Investigations Report (DBIR) [1], 3,841 phishing incidents were reported till May 2021, wherein data disclosure was confirmed for at least 50% of the cases

  • A Uniform Resource Locator (URL) is a unique identifier that locates a resource on the internet

Read more

Summary

Overview

The internet has seen rapid growth in the last decade. With the internet connecting billions of people globally, it is critically important to acknowledge the fact that the safety and privacy of internet users are not optimal. The rate of cyber-crime is on the increase and leads to great financial losses each year. Phishing attacks account for more than 80% of reported security incidents. According to Verizon’s 2021 Data Breach Investigations Report (DBIR) [1], 3,841 phishing incidents were reported till May 2021, wherein data disclosure was confirmed for at least 50% of the cases. Number of breaches involving phishing shows an 11% increase in 2021 as compared to the previous year. 95% of these attacks were financially motivated causing a loss of thousands of dollars per minute. A Uniform Resource Locator (URL) is a unique identifier that locates a resource on the internet. It carries various parts such as protocol, domain name, port, path, query, etc. Machine Learning algorithms prove to be an accurate and efficient method in recognizing these features, and in predicting whether a given website is a phishing website or a safe one

Ensemble Methods
Related Work
Limitations of Existing System
System Architecture
Raw input processing
Feature selection
Classification model
Random Forest Algorithm
XGBoost Algorithm
Result and Discussion
Findings
Performance measures
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call