Abstract

AbstractWith the rapid evolution of malware in the past few years, it caused serious threats and damage to network security. To handle this, researchers began to propose effective classification approaches for various malware variants. However, these widely-used methods based on deep learning are in fully supervised manner, which suffers from two inevitable problems: 1) time-consuming: manually labeling data before training fully-supervised models require huge manual efforts. 2) resource-redundancy: a large amount of unlabeled data is not fully used, resulting in a resource waste. To solve the above problems, in this paper we propose a Malware Classification Method based on Semi-Supervised Learning namely MCM-SSL, which divides the model training into a pre-train stage using unlabeled data and a finetune stage using labeled data. The method proposed in this paper effectively uses a large amount of unlabeled data, and only needs a small amount of labeled data to achieve excellent performance. As a result, our method achieves an accuracy of 90.51% on the open-source Virus-MNIST dataset, which is superior to recent state-of-the-art methods. We also verify the generality and robustness of our method using a variety of common neural network algorithms. For the same algorithm, the accuracy of the pre-trained model is on average 2.4% higher than the model without pre-training.KeywordsMalware classificationSemi-supervised learningContrastive learning

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call