Definition of Phishing Sites Based on the Team Model of Fuzzy Neural Networks

Ilyas Idrisovich Ismagilov,Aynur Ayratovich Murtazin,Alexey Sergeevich Katasev,Dina Vladimirovna Kataseva,Andrey Igorevich Barinov

doi:10.29042/2020-10-5-133-140

Abstract

This paper solves the problem of defining phishing sites based on building a team model of fuzzy neural networks (FNNs). The main methods of phishing are analyzed. Attention is drawn to the fact that phishing has become widespread on the Internet through the use of phishing sites. The expediency of identifying phishing sites based on the analysis of their URLs is noted. The main approaches to identifying phishing sites are described. The need toimplement an approach based on machine learning by constructing fuzzy neural networks for the creation of fuzzy knowledge bases and their use to identify phishing sites is actualized. Automating the identification of phishing sites based on the neuro-fuzzy approach required solving the problems of collecting and preparing initial data for analysis,building a team model of fuzzy neural networks, and forming a fuzzy knowledge base, as well as conducting research, and assessing the accuracy of identifying phishing sites based on the constructed model. The initial data was formed from various sources. The total amount of initial data was 50,000. Of these, 10 input features for analysis were selected by an expert. After carrying out the correlation analysis, 4 most informative input features were selected for analysis: site lifetime, site rank, URL length, and the registered status of the site. An output feature of the site was its type: phishing or legitimate. After assessing the quality and cleaning the selected data, the resulting sample was formed of 34718 rows, of which 70% were used for learning (24303 rows), and 30% (10415 rows) for testing. A team model offuzzy neural networks was built and a knowledge base was formed on the basis of the data obtained, including 4608 fuzzy rules. Studies have shown that the number of errors of the 1st type in identifying phishing sites is 2.01%, and 2.89% for errors of the 2nd type. The general classification error based on knowledge base rules is 4.9%. The accuracy of identifying phishing sites was 95.1%, which exceeds the accuracy of other classification methods: multilayer neural network, decision tree, linear and logistic regression. The knowledge base formed on the basis of the team model of fuzzy neural networks can be effectively used to identify phishing sites on the Internet.

Full Text