Detection of malicious websites across multiple classes using n-gram features and VGG based on URL analysis

Qichen Liu

doi:10.54254/2755-2721/18/20230965

Abstract

Due to the ubiquity of the internet, cyber-attacks implemented through websites have become a severe issue with high frequency and appreciable overall financial damage. Detecting malicious URLs has become one of the most common solutions to tackle this threat, which is widely applied in the market and researched. Inspired by relevant work on URL classification using n-gram techniques and convolutional neural networks in other research areas, a method for detecting malicious websites using n-gram statistical features of URLs and a VGG-style neural network has been developed, which aims to provide classification for multiple website classes with arbitrary URL input lengths. Experimental results show that the method proposed in this paper provides an average accuracy of 96.60% on the 5-class ISCX-URL2016 dataset and 96.33% on the 4-class Malicious URLs dataset, which is 1.5 times larger. A further comparison reveals that the accuracies are competitive with similar methods for binary classifications that also use either n-gram features or a VGG-based network.

Full Text