Abstract

The role of malware classification is crucial in addressing the explosive increase in malware variants. By classifying malware instances into malware families, malware analysts can apply appropriate techniques and tools to handle malware variants in each family. Using high-level representations of malware, such as disassembled codes, yields meaningful classification performance. However, malware classification based on disassembled codes depends on the practically implausible assumption that every malware is correctly reversed by disassemblers. Unfortunately, sophisticated malware, which has anti-disassembly capabilities, seeks to confuse disassemblers, yielding incorrectly disassembled codes. In this study, we focus on malware family classification, which requires no disassembly, and propose a new CNN-based malware classification model using non-disassembled malware files (i.e., binary files). Our model associates two modalities: “malware images” and “structural entropies,” which are converted and extracted from binary files. Both modalities have different granularities of bytes and chunks that complement each other. The model adopts a cross-modal attention mechanism to combine the features of the two modalities by moderating their expressive limitations. We validate our model using three popular datasets from the Kaggle Microsoft Malware Classification, Malimg, and BODMAS datasets. The experimental results show that our model identifies malware families with a higher degree of accuracy than previous methods and does not require the burden of disassembling.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.