From Data and Model Levels: Improve the Performance of Few-Shot Malware Classification

Lihua Yin,Zhihong Tian,Jing Qiu,Lejun Zhang,Brij B Gupta,Yuhan Chai

doi:10.1109/tnsm.2022.3200866

Abstract

Existing malware classification methods cannot handle the open-ended growth of new or unknown malware well because it only focuses on pre-defined malware classes with sufficient training data. Due to the superiority of the visualization method, some researchers use it for solving few-shot malware classification. However, the malware images generated by existing visualization methods contain insufficient semantic information. At the same time, existing few-shot models tend to converge to sharp minima resulting in poor generalization performance. By synthesizing the observations, we think that accurate and effective few-shot malware classification methods are affected by generated malware images and classification models, which can be called data and model levels, respectively. To solve the above problems, we propose a novel method from the Data and Model levels, which is used to classify new or unknown malware well, called DMMal. More specifically, we propose a multi-channel malware image generation method based on multi-view so that malware images can contain more prosperous information at the data level. In addition, we investigated adaptive sharpness-aware minimization in a few-shot scenario from the perspective of model optimization at the model level to minimize the loss value and sharpness simultaneously. This enhances the generalization ability of the model and improves the ability of the model to classify new or unknown classes. Experiments on two few-shot malware classification datasets show that the method proposed can improve the performance of few-shot malware classification from the data and model levels.

Full Text