The rapid expansion of the internet has led to a corresponding surge in malicious online activities, posing significant threats to users and organizations. Cybercriminals exploit malicious uniform resource locators (URLs) to disseminate harmful content, execute phishing schemes, and orchestrate various cyber attacks. As these threats evolve, detecting malicious URLs (MURLs) has become crucial for safeguarding internet users and ensuring a secure online environment. In response to this urgent need, we propose a novel machine learning-driven framework designed to identify known and unknown MURLs effectively. Our approach leverages a comprehensive dataset encompassing various labels—including benign, phishing, defacement, and malware—to engineer a robust set of features validated through extensive statistical analyses. The resulting malicious URL detection system (MUDS) combines supervised machine learning techniques, tree-based algorithms, and advanced data preprocessing, achieving a high detection accuracy of 96.83% for known MURLs. For unknown MURLs, the proposed framework utilizes CL_K-means, a modified k-means clustering algorithm, alongside two additional biased classifiers, achieving 92.54% accuracy on simulated zero-day datasets. With an average processing time of under 14 milliseconds per instance, MUDS is optimized for real-time integration into network endpoint systems. These outcomes highlight the efficacy and efficiency of the proposed MUDS in fortifying online security by identifying and mitigating MURLs, thereby reinforcing the digital landscape against cyber threats.
Read full abstract