A fuzzy-based ensemble model for improving malicious web domain identification

Raymond Chiong,Zuli Wang,Zongwen Fan,Sandeep Dhakal

doi:10.1016/j.eswa.2022.117243

Abstract

Accurate identification of malicious web domains is crucial for protecting users from the risks of theft of private information, malware attack, and monetary loss. Various methods, including blacklists and machine learning-based models, have been proposed to identify malicious web domains effectively. However, maintaining an up-to-date blacklist is difficult, and standard machine learning-based models are typically sensitive to noise in data. In this paper, we propose an ensemble model based on the fuzzy-weighted Least Squares Support Vector Machine (EFW-LS-SVM) for improving malicious web domain identification. Given the fact that different data samples may have varying importance, we introduce a fuzzy-weighted operation by applying it to each data sample. This is the first time the fuzzy-weighted operation has been incorporated into an ensemble approach for malicious web domain identification. Our proposed EFW-LS-SVM delivers excellent results for identifying malicious web domains; it outperformed the compared machine learning models in terms of the F-measure score, as well as provided the best or very competitive accuracy of up to 94.50% for all datasets included in our experiments. Further, considering the imbalanced nature of benign and malicious web domain data, where malicious web domains tend to be the minority, we used the Synthetic Minority Over-sampling Technique (SMOTE) to further improve the performance of all models tested. Our experimental results confirm that SMOTE re-sampling can improve the performance of all the models, including our proposed EFW-LS-SVM—the F-measure score of EFW-LS-SVM was improved by up to 3.29%.

Full Text