Abstract

This study aimed to develop a robust machine learning-based phishing detection system using algorithms such as K-nearest neighbour (KNN), artificial neural network (ANN), and random forest (RF). It utilised datasets from Ariyadasa et al. (2021) and UNB (2016) to discern patterns distinguishing legitimate from phishing websites. Furthermore, an objective was to integrate the optimal model into a Django-based web application, facilitating real-time phishing detection. A comprehensive literature review on phishing detection techniques was also undertaken. Datasets chosen underwent rigorous pre-processing to address missing values and imbalance. Feature selection was achieved manually and automatically using mutual information classification. Three machine learning algorithms, RF, KNN, and ANN, were explored. Their hyper-parameters were optimised using GridSearchCV. Performance results highlighted RF's accuracy at 99.78%, KNN's at 99.67%, and ANN's at 99.11%. While RF and KNN models perfectly identified legitimate websites, ANN showcased an impeccable detection of phishing websites. The RF model, with the highest accuracy, was integrated into a Django application, providing a user interface for real-time phishing detection. All models exhibited high accuracy rates, demonstrating their efficacy in phishing detection. While RF was integrated into the web application for this study, the choice between models depends on specific user or business requirements and priorities. Feedback mechanisms within the Django application further promise refinement in future recommendations. The study provides a foundational step toward enhancing web safety through effective phishing detection.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call