Abstract

Phishing is a prevalent cyber attack that uses deceptive websites to trick individuals into revealing personal information. These sites mimic legitimate ones to steal data such as usernames, passwords, and financial details. Detecting phishing is crucial, and machine learning algorithms are effective tools for this task. Attackers favor phishing due to its effectiveness in tricking victims with authentic-looking yet malicious links, which can breach security measures. This method employs machine learning to innovate phishing website detection. However, attackers can manipulate features like HTML, DOM, and URLs using web scraping and scripting languages. A new approach using machine learning classifiers tackles these threats by analyzing internet URLs and domain names. A dataset sourced from globally recognized intelligence services and organizations facilitates streamlined feature extraction, reducing processing overhead by prioritizing URL and domain name traits. The Gradient Boosting Classifier is used on an 11,055-instance dataset with thirty-two features to classify phishing URLs, demonstrating superior accuracy compared to methods like Random Forest. Gradient boosting is highly effective across various machine learning tasks, leveraging aggregated weak learners such as decision trees for strong predictive accuracy. Its suitability for handling imbalanced datasets makes it particularly effective for phishing detection, which is crucial for distinguishing between legitimate and malicious URLs. This method enhances accuracy by extracting and comparing distinct characteristics of legitimate and phishing URLs. By focusing on URL and domain name attributes, a more effective approach to identifying phishing attempts in cybersecurity is proposed.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.