Abstract

The emergence of a large number of new malicious code poses a serious threat to network security, and most of them are derivative versions of existing malicious code. The classification of malicious code is helpful to analyze the evolutionary trend of malicious code families and trace the source of cybercrime. The existing methods of malware classification emphasize the depth of the neural network, which has the problems of a long training time and large computational cost. In this work, we propose the shallow neural network-based malware classifier (SNNMAC), a malware classification model based on shallow neural networks and static analysis. Our approach bridges the gap between precise but slow methods and fast but less precise methods in existing works. For each sample, we first generate n-grams from their opcode sequences of the binary file with a decompiler. An improved n-gram algorithm based on control transfer instructions is designed to reduce the n-gram dataset. Then, the SNNMAC exploits a shallow neural network, replacing the full connection layer and softmax with the average pooling layer and hierarchical softmax, to learn from the dataset and perform classification. We perform experiments on the Microsoft malware dataset. The evaluation result shows that the SNNMAC outperforms most of the related works with 99.21% classification precision and reduces the training time by more than half when compared with the methods using DNN (Deep Neural Networks).

Highlights

  • Malware has always been one of the main threats to cybersecurity, and the detection and analysis of malicious code has always attracted much attention

  • We present the shallow neural network-based malware classifier (SNNMAC), a model based on static features and shallow neural networks to classify a Windows malware sample to a known family

  • The SNNMAC, the malware classification model we proposed in this paper, is for portable executable (PE) files, the binary executable file format on Windows

Read more

Summary

Introduction

Malware has always been one of the main threats to cybersecurity, and the detection and analysis of malicious code has always attracted much attention. The existing work emphasizes the depth of the neural network It has achieved good classification results, it brings a whole host of problems, including parameters that are difficult to adjust, high calculation and storage cost and low analysis efficiency, which makes it difficult to apply to a scene with a huge amount of malicious code. The existing work emphasizes the depth of the neural network, which brings some problems, such as parameters that are difficult to adjust, high calculation and storage costs and low analysis efficiency. This makes it difficult to apply to a scene with a huge amount of malicious code.

Related Work
Classification Methodology
Overview of Classification Model
Opcode Sequences Process
An Improved n-gram Algorithm Based on Control Transfer Instructions
The Shallow Neutral Network
IInnppuutt LLaayer
GHz Intel Core i5 8 GB 1600 MHz DDR3
Model Performance
The CTIB-n-gram Algorithm
Method NB LR SVM RF XGB
Conclusions and Future Work
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.