Characteristics of Understanding URLs and Domain Names Features: The Detection of Phishing Websites With Machine Learning Methods

Ilker Kara,Murathan Ok,Ahmet Ozaday

doi:10.1109/access.2022.3223111

Ilker Kara, Murathan Ok + Show 1 more

Open Access

https://doi.org/10.1109/access.2022.3223111

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2022
Citations: 3	License type: CC BY-NC-ND 4.0

Affiliation: Hacettepe University, Çankırı Karatekin University

Abstract

Along with the means of communication, it has also prompted the birth of more harmful, and challenging websites in the device of information systems, and electronics. According to current estimates, you can deal with a huge budget to arrange detailed information on attackers. Furthermore, only those that are handled similarly to HTML, DOM, and URL based features in the literature are easily manipulated by attackers. To respond to these attacks, we propose a new method that detects phishing websites by categorizing the Internet URL, and domain names of websites with six different classifier algorithms according to eleven predetermined features. For this method, we created a previously unused list. The list was obtained by analyzing an index created with information obtained from internationally reputable intelligence services, and entire organizations. The proposed method simplifies the process of feature extraction, and reduces processing overhead while going beyond analyzing on HTML, DOM, and URL based features by considering URLs, and domain names. To illustrate the highest accuracy rate among six different classification results, we preferred to use the Random Forest algorithm. In this study, we use a dataset with 32,928 data in which 12,134 data without phishing websites, and 20,614 data with phishing websites to be labeled according to eleven predetermined features. Our experimental results show that phishing websites can be detected with as much as 98.90% accuracy with our proposed method. As a result, it has been demonstrated that RF descriptors with SVM representation can be utilized to accurately mark phishing web pages. In addition, characteristic updates can be followed with a continuously updated source.

Full Text