Abstract

Given the ever-increasing number of malware variants, detecting malware families is crucial. However, the operand semantics of assembly instructions are strongly related to the operating environment and are difficult to extract. This leads to the lack of instruction semantics and the difficulty in correctly classifying malware variants. At the same time, previous research does not mine the internal structural features of the instructions and the contextual relationships between them. This makes it difficult to efficiently identify virus variants. With this as motivation, this article presents a malware family classification method called EII-MBS (enhanced instruction-level behavior semantics learning). By abstracting the types of operands, the semantics of the operands are separated from the constraints of the operating environment. After this, the structure, relationship, and context information of the instructions are fully mined and these three aspects of instruction behavior semantics are embedded into a vector representation for the subsequent building of malware feature images. Furthermore, our method creates channel attention for capturing important features. In addition to the widely used Microsoft Malware Classification Challenge dataset, we take the lead in conducting experiments on the recently made available BODMAS dataset. The average accuracy rates of EII-MBS are 99.40% and 99.26% on the two datasets, respectively. Further experiments on different proportions of training datasets and testing datasets show that our method achieves state-of-the-art malware family classification performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call