Phishing Detection: A Hybrid Model with Feature Selection and Machine Learning Techniques

Dhyan Chandra Yadav,Mithilesh Kumar Pandey,Rekha Pal,Saurabh Pal

doi:10.52756/ijerr.2023.v36.009

Dhyan Chandra Yadav, Mithilesh Kumar Pandey + Show 2 more

Open Access

https://doi.org/10.52756/ijerr.2023.v36.009

Copy DOI

Abstract

Various phishing problems increase in cyber space with the progress of information technology. One of the prominent cyber-attacks rooted in social engineering is known as phishing. This malicious activity aims to deceive individuals into divulging sensitive information, including credit card details, login credentials, and passwords. The main importance of this research is finding the best outcome by various machine learning (ML) techniques. This paper uses a Tree Classifier (ETC), Forward Selection, Pearson correlation, Logit-LR model and Principal_Component_Analysis for feature selection. The Logistic_regression (LR), Naïve_Bayes (NB), Decision_Tree (DT), K-Nearest Neighbor (K-NN), Support_Vector_Machine (SVM), Random_Forest (RF), AdaBoost and Bagging classifiers are used for developing the phishing detection model. We have studied the model in four cases. Case 1 has 6 commonly selected features by ET, forward selection and Pearson's correlation, case 2 has 25 features by logit model, case 3 has all features, and case 4 has principal component analysis (3 and 5 components). We find the highest accuracy of 97.3% in case 2 with the random forest model.

Full Text