Malicious websites are intentionally created websites that aid online criminals in carrying out illicit actions. They commit crimes like installing malware on the victim's computer, stealing private data from the victim's system, and exposing the victim online. Malicious codes can also be found on legitimate websites. Therefore, locating such a website in cyberspace is a difficult operation that demands the utilization of an automated detection tool. Currently, machine learning/deep learning technologies are employed to detect such malicious websites. However, the problem persists since the attack vector is constantly changing. Most research solutions use a limited number of URL lexical features, DNS information, global ranking information, and webpage content features. Combining several derived features involves computation time and security risk. Additionally, the dataset's minimal features don't maximize its potential. This paper exclusively uses URLs to address this problem and blends linguistic and vectorized URL features. Complete potential of the URL is utilized through vectorization. Six machine learning algorithms are examined. The results indicate that the proposed approach performs better for the count vectorizer with random forest algorithm
Read full abstract