Malware detection using different supervised learning methods

Kailin Lyu,Fengning Yang,Luning Zhang

doi:10.54254/2755-2721/5/20230618

Abstract

Malware that can threaten cyber security is now a problem that needs to be addressed. Malware detection has been significantly developed through the use of machine learning techniques. However, simple supervised learning is rarely used for malware detection. The goal of this paper is to compare the accuracy of different supervised learning models in malware detection. The experiment will load a dataset with malware and corresponding characteristics, train four models separately using a decision tree, logistic regression, and Nave Bayes, and compare their train scores, test scores, and test time. The experimental result shows that the decision tree model has the best in malware detection, following the Logistic regression model and the Gaussian Nave Bayes model which has similar scores. In contrast, the Multinomial Nave Bayes model has lower scores. Moreover, the logistic regression model is significantly slower than other models. The results of the experiments reflect the different efficiency and accuracy of different supervised learning methods when used for malware detection. A possible reason for the differences between the methods is due to the complex dataset, and in conclusion, supervised learning could play a crucial role in cyber security.

Full Text