Detecting Malware with an Ensemble Method Based on Deep Neural Network

Jinpei Yan,Yong Qi,Qifan Rao

doi:10.1155/2018/7247095

Abstract

Malware detection plays a crucial role in computer security. Recent researches mainly use machine learning based methods heavily relying on domain knowledge for manually extracting malicious features. In this paper, we propose MalNet, a novel malware detection method that learns features automatically from the raw data. Concretely, we first generate a grayscale image from malware file, meanwhile extracting its opcode sequences with the decompilation tool IDA. Then MalNet uses CNN and LSTM networks to learn from grayscale image and opcode sequence, respectively, and takes a stacking ensemble for malware classification. We perform experiments on more than 40,000 samples including 20,650 benign files collected from online software providers and 21,736 malwares provided by Microsoft. The evaluation result shows that MalNet achieves 99.88% validation accuracy for malware detection. In addition, we also take malware family classification experiment on 9 malware families to compare MalNet with other related works, in which MalNet outperforms most of related works with 99.36% detection accuracy and achieves a considerable speed-up on detecting efficiency comparing with two state-of-the-art results on Microsoft malware dataset.

Highlights

Nowadays, various kinds of software provide wealth resources for users and bring a certain potential danger; malware detection is always a highly concerned issue in computer security field
To evaluate and optimize MalNet, we focus on four parts: performance of MalNet Convolution Neural Network (CNN), performance of MalNet Long-Short Term Memory (LSTM), stacking ensemble result, and comparison with other works, respectively
Since the grayscale image contains a wealth of local slight information, we consider that VGGNet can achieve a better performance due to its deeper network structure which can capture more localized image association

Summary

Introduction

Various kinds of software provide wealth resources for users and bring a certain potential danger; malware detection is always a highly concerned issue in computer security field. The number of samples is too large, requiring a highly effective way to detect malwares. A large number of researches have studied methods for analyzing and detecting malware. Traditional commercial antivirus products usually rely on signature-based method, which needs a local signature database to store patterns extracted from malware by experts. This approach has great limitations since specific minor changes to malware can change the signature, so more and more malware could evade signature-based detection by encrypting, obfuscating, or packing. Many different malware detection approaches with machine learning technology have been proposed in recent years, such as static analysis which learns statistical characteristics like API calls, N-grams, and so on [3, 4] or dynamic behavior analysis [5]. Malware uses packing technologies to prevent reverse engineering which leads to high costs for static analysis

Methods

Findings

Discussion

Conclusion