Abstract

Abstract: Phishing attacks pose a significant threat in cyberspace, Cybercrimes such as Phishing means accessing personal data and violating security through the Internet and its main aim is to steal the information from the users using different techniques in that the primary one is Phishing which demands effective detection mechanisms. This study evaluates the performance of Gradient Boosting Classifier, Random Forest, and Decision Tree machine learning models in conjunction with feature selection techniques namely SelectKBest and Chi-Square. Initially, a comprehensive feature set of 30 attributes achieved a baseline accuracy of 97.4%. Through SelectKBest and Chi-Square feature selection methods, 13 key features were identified, leading to a slightly decreased accuracy of 95.6% upon model retraining. This research highlights the importance of feature selection in enhancing phishing detection accuracy while maintaining model interpretability. Technologies such as NumPy, Pandas, Matplotlib, Scikit-learn, and Flask drive. The project, emphasising the exploration of ML models, EDA (Exploratory Data Analysis) on phishing data, and understanding feature importance. The Machine Learning Models like Gradient Boosting Classifier, Random Forest and Decision tree are used to detect whether the given URL is Malicious URL or Legitimate URL using Comprehensive Feature set and Key Feature set. This Models and Feature sets are compared based on the Performance Metrics namely Accuracy, Precision, Recall and F1- score. This study contributes to advancing cyber defence mechanism through the fusion of sophisticated Machine Learning algorithms and meticulous Feature Selection methodologies.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call