Abstract

Phishing is the easiest method for gathering sensitive information from unwary people. Phishers seek to get private data including passwords, login information, and bank account details. Cyber security experts are actively seeking for trustworthy and effective ways to identify phishing websites. In order to distinguish between legal and phishing URLs, we used machine learning (ML) technology. In this research work using ML technology extraction and analysis of both types of URLs was performed. Extreme Gradient Boosting (XGBoost), Decision Tree (DT), Logistic Regression (LR), Random Forest (RF), and Support Vector Machine (SVM) were used to identify phishing websites. The goal was to identify phishing URLs and determine the most effective ML technique by comparing the accuracy rates of each algorithm. In this, proposed methodology two datasets were used. The accuracy of models was calculated on Phishtank and UCI dataset using kfold, feature selection and hyperparameter tuning method. Performance measures precision, recall and F1-score and Receiver Operating Characteristics (ROC) curve were calculated. RF provided an accuracy of 98.80% and 97.87% on the Phishtank dataset and UCI respectively. Highest precision, recall, F1-score value was 99% each and AUC-ROC value was 99.89% with Phishtank dataset. Validation with other researchers showed better results with proposed methodology. Therefore this methodology can be of help to identify phishing websites.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call