Abstract
AbstractWith the rapid evolution of malware in the past few years, it caused serious threats and damage to network security. To handle this, researchers began to propose effective classification approaches for various malware variants. However, these widely-used methods based on deep learning are in fully supervised manner, which suffers from two inevitable problems: 1) time-consuming: manually labeling data before training fully-supervised models require huge manual efforts. 2) resource-redundancy: a large amount of unlabeled data is not fully used, resulting in a resource waste. To solve the above problems, in this paper we propose a Malware Classification Method based on Semi-Supervised Learning namely MCM-SSL, which divides the model training into a pre-train stage using unlabeled data and a finetune stage using labeled data. The method proposed in this paper effectively uses a large amount of unlabeled data, and only needs a small amount of labeled data to achieve excellent performance. As a result, our method achieves an accuracy of 90.51% on the open-source Virus-MNIST dataset, which is superior to recent state-of-the-art methods. We also verify the generality and robustness of our method using a variety of common neural network algorithms. For the same algorithm, the accuracy of the pre-trained model is on average 2.4% higher than the model without pre-training.KeywordsMalware classificationSemi-supervised learningContrastive learning
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.