PHISHING DETECTION SYSTEM USING MACHINE LEARNING

Dr Sivakumar, R

doi:10.55041/ijsrem32513

Abstract

This study focuses primarily on phishing attacks, a prevalent form of cybercrime conducted over the internet. Despite originating in 1996, phishing has evolved into one of the most severe threats online. It relies on email deception, often coupled with fraudulent websites, to trick individuals into divulging sensitive information. While various studies have explored preventive measures and detection techniques, there remains a lack of a comprehensive solution. Hence, leveraging machine learning is crucial in combating such cybercrimes effectively. The study utilizes a phishing URL-based dataset sourced from a renowned repository, comprising attributes of both phishing and legitimate URLs collected from over 11,000 websites. Following data preprocessing, several machine learning algorithms are employed to thwart phishing URLs and safeguard users. These algorithms include decision trees (DT), linear regression (LR), random forest (RF), naive Bayes (NB), gradient boosting classifier (GBM), K-neighbors classifier (KNN), support vector classifier (SVC), and a novel hybrid model, LSD, which integrates logistic regression, support vector machine, and decision tree (LR+SVC+DT) with soft and hard voting mechanisms. Additionally, the canopy feature selection technique, cross-fold validation, and Grid Search Hyperparameter Optimization are employed with the proposed LSD model. To assess the effectiveness of the proposed approach, various evaluation metrics such as precision, accuracy, recall, F1-score, and specificity are employed.

Full Text