Malware identification using visualization images and deep learning

Sang Ni,Quan Qian,Rui Zhang

doi:10.1016/j.cose.2018.04.005

Abstract

Currently, malware is one of the most serious threats to Internet security. In this paper we propose a malware classification algorithm that uses static features called MCSC (Malware Classification using SimHash and CNN) which converts the disassembled malware codes into gray images based on SimHash and then identifies their families by convolutional neural network. During this process, some methods such as multi-hash, major block selection and bilinear interpolation are used to improve the performance. Experimental results show that MCSC is very effective for malware family classification, even for those unevenly distributed samples. The classification accuracy can be 99.260% at best and 98.862% at average on a malware dataset of 10,805 samples which is higher than other compared algorithms. Moreover, for MCSC, on average, it just takes 1.41 s to recognize a new sample, which can meet the requirements in most of the practical applications.

Full Text