Comparative Analysis of Various Ensemble Algorithms for Computer Malware Prediction

Yusuf Bayu Wicaksono Yusuf Bayu Wicaksono,Christina Juliane Christina Juliane

doi:10.29207/resti.v7i3.4492

Yusuf Bayu Wicaksono Yusuf Bayu Wicaksono, Christina Juliane Christina Juliane

Open Access

https://doi.org/10.29207/resti.v7i3.4492

Copy DOI

Abstract

By 2022 it is estimated that 29 billion devices have been connected to the internet so that cybercrime will become a major threat. One of the most common forms of cybercrime is infection with malicious software (malware) designed to harm end users. Microsoft has the highest number of vulnerabilities among software companies, with the Microsoft operating system (Windows) contributing to the largest vulnerabilities at 68.85%. Malware infection research is mostly done when malware has infected a user's device. This study uses the opposite approach, which is to predict the potential for malware infection on the user's device before the infection occurs. Similar studies still use single algorithms, while this study uses ensemble algorithms that are more resistant to bias-variance trade-off. This study builds models from data on computer features that affect the possibility of malware infection on computer devices with Microsoft Windows operating system using ensemble algoritms, such as Bagging Classifier, Random Forest, Light Gradient Boosting Machine, Extreme Gradient Boosting Machine, Category Boosting, and Stacking Classifier. The best model is Stacking Classifier, which is a combination of Light Gradient Boosting Machine and Category Boosting Classifier, with training and test results of 0.70665 and 0.64694. Important features have also been identified as a reference for taking policies to protect user devices from malware infections.

Full Text