Classification of malicious and benign websites by network features using supervised machine learning algorithms

Sanaa Kaddoura

doi:10.1109/csnet52717.2021.9614273

Abstract

Due to the increase in Internet usage through the past years, cyber-attacks have rapidly increased, leading to high personal information and financial loss. Cyberattacks can include phishing, spamming, and malware. Because websites, the most common element of the Internet, are widely used, hackers find their targets to attack. Therefore, the detection of malicious websites is critical for organizations and individuals to increase security. The earlier a malicious website is detected, the faster it is defended. In this paper, a dataset is analyzed and applied to multiple supervised machine learning models such as Random Forest, Artificial Neural Network, K-nearest neighbors, and Support Vector Machine. The dataset attributes are extracted based on the application layer and different network characteristics. The experimental studies with many benign and malicious websites obtained from real-life Internet resources show a high prediction performance. Due to the imbalanced dataset studied in this paper, the F1-score was measured instead of the accuracy. The support vector machine algorithm showed the highest performance over all the other algorithms studied, with a value of 92%.

Full Text