Abstract

Malicious Uniform Resource Locators (URLs) re-main one of the most common threats to cybersecurity. They are commonly spread through phishing, malware and spam. One popular way to detect malicious URLs is through black-lists. Blacklists maintain records of previously known malicious URL reputations. These lists are however shortcoming when there is need to detect newly generated malicious URLs. For that reason, modern research has resorted to training machine learning algorithms to detect malicious URLs. In this paper, we contributed towards the detection of malicious URLs using URL based features in a multiclass classification setting. We focused on three popular URL attack types which are phishing, spam and malware. Our work can be used as a supplementary tool in new or existing anti-phishing, anti-spam and anti-malware detection platforms. We compared the performance of the following ensemble learners: Extreme Gradient Boosting (XGBoost), Adaptive Boosting (AdaBoost), Light Gradient Boosting (LightGBM) and Categorical Boosting (CatBoost). We evaluated the performance of some URL features that we referred to as our features. These included priority features like Kullback-Leibler Divergence (KL divergence), bag of words segmentation and other word-based features. Results showed that our features performed better when compared to experiments we conducted without our features. We trained these algorithms on 126 983 URLs from benchmark datasets and all four learners returned an overall accuracy above 0.95.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.