DeepMal: maliciousness-Preserving adversarial instruction learning against static malware detection

Chun Yang,Yanna Wu,Yu Wen,Jinghui Xu,Shuangshuang Liang,Dan Meng,Boyang Zhang

doi:10.1186/s42400-021-00079-5

Abstract

Outside the explosive successful applications of deep learning (DL) in natural language processing, computer vision, and information retrieval, there have been numerous Deep Neural Networks (DNNs) based alternatives for common security-related scenarios with malware detection among more popular. Recently, adversarial learning has gained much focus. However, unlike computer vision applications, malware adversarial attack is expected to guarantee malwares’ original maliciousness semantics. This paper proposes a novel adversarial instruction learning technique, DeepMal, based on an adversarial instruction learning approach for static malware detection. So far as we know, DeepMal is the first practical and systematical adversarial learning method, which could directly produce adversarial samples and effectively bypass static malware detectors powered by DL and machine learning (ML) models while preserving attack functionality in the real world. Moreover, our method conducts small-scale attacks, which could evade typical malware variants analysis (e.g., duplication check). We evaluate DeepMal on two real-world datasets, six typical DL models, and three typical ML models. Experimental results demonstrate that, on both datasets, DeepMal can attack typical malware detectors with the mean F1-score and F1-score decreasing maximal 93.94% and 82.86% respectively. Besides, three typical types of malware samples (Trojan horses, Backdoors, Ransomware) prove to preserve original attack functionality, and the mean duplication check ratio of malware adversarial samples is below 2.0%. Besides, DeepMal can evade dynamic detectors and be easily enhanced by learning more dynamic features with specific constraints.

Highlights

Malware is a significant concern of cybersecurity because of its severe damage and threats to network and computing device security
For Kaggle data, classification accuracy refers to the predicted class that are correctly categorized, mean F1 is the mean F1-score of both two malware classes; while for phd-data, Precision is the predicted malware samples that are truly malware samples, while Recall is the malware samples that are caught by models
The results show that DeepMal could effectively bypass malware detectors powered by deep learning (DL) and machine learning (ML)

Summary

Introduction

Malware is a significant concern of cybersecurity because of its severe damage and threats to network and computing device security. Static malware detection (Anderson and McGrew 2017; Banescu et al 2017; Burnaev and Smolyakov 2016; Dahl et al 2013; Gardiner and Nagaraja 2016; Saxe and Berlin 2015; Ye et al 2017), as one of promising defense techniques in the security community, is to accurately identify the binary files of malware (2021) 4:16 vulnerable to well-tuned perturbations generated by adversarial learning techniques on the original data samples, called adversarial attacks Studying such an attack on static malware detection benefits in improving the robustness of AI-driven detection models. It is difficult for malware samples to bypass CNNs-based detectors

Methods

Results

Conclusion