A Node-Embedding Features Based Machine Learning Technique for Dynamic Malware Detection

Sparsh Mittal,Ashish Mittal,Sudhir Kumar Rai

doi:10.1109/dsc54232.2022.9888836

Abstract

As the malware menace exacerbates, dynamic malware detection (DMD) has become even more critical. In this paper, we present a machine-learning-based DMD technique. We propose generating node embedding features (NEFs) from process execution chains. We use NEFs and other features based on the command line, file path, and action taken by a process and feed them to our machine learning (ML) classification algorithms. We evaluated two ML classifiers, viz., light gradient boosting machine (LGBM) and XGBoost. We perform experiments on a real-world dataset provided by a leading anti-virus company. Our technique achieves high accuracy, and the use of NEFs improves the predictive performance of ML classification algorithms. Also, NEFs are found to be highly important in both these algorithms.

Full Text