An Efficient Method for Generating Adversarial Malware Samples

Yuxin Ding,Kunyang Fu,Miaomiao Shao,Cai Nie

doi:10.3390/electronics11010154

Yuxin Ding, Kunyang Fu + Show 2 more

Open Access

https://doi.org/10.3390/electronics11010154

Copy DOI

Abstract

Deep learning methods have been applied to malware detection. However, deep learning algorithms are not safe, which can easily be fooled by adversarial samples. In this paper, we study how to generate malware adversarial samples using deep learning models. Gradient-based methods are usually used to generate adversarial samples. These methods generate adversarial samples case-by-case, which is very time-consuming to generate a large number of adversarial samples. To address this issue, we propose a novel method to generate adversarial malware samples. Different from gradient-based methods, we extract feature byte sequences from benign samples. Feature byte sequences represent the characteristics of benign samples and can affect classification decision. We directly inject feature byte sequences into malware samples to generate adversarial samples. Feature byte sequences can be shared to produce different adversarial samples, which can efficiently generate a large number of adversarial samples. We compare the proposed method with the randomly injecting and gradient-based methods. The experimental results show that the adversarial samples generated using our proposed method have a high successful rate.

Highlights

Deep neural networks have been successfully applied in different fields, such as computer vision and natural language processing
Because the feature sequences can be shared by all adversarial samples, the proposed method is suitable for generating a large number of adversarial samples
In this paper we study how to generate malware adversarial samples

Summary

Introduction

Deep neural networks have been successfully applied in different fields, such as computer vision and natural language processing. The experimental results show that deep learning-based malware detectors can achieve high detection accuracy. Despite their successful application in different fields, deep learning methods are sensitive to small perturbations in input samples. Szegedy et al [5] found that small changes on input samples can cause classification errors These perturbed samples are called adversarial samples. Each time they only translate a source malware sample into a corresponding adversarial malware sample. If the number of padding bytes needed to inject into a malware is large, the time cost for generating an adversarial sample is very high. These methods are not suitable for generating a large number of adversarial samples

Methods

Discussion

Conclusion