Enhancing Robustness of Malware Detection Model Against White Box Adversarial Attacks

Manav Khorasiya,Riya Singhal,Meet Soni,Shruti Bhatt,Devesh C Jinwala

doi:10.1007/978-3-031-24848-1_13

Abstract

Deep Neural Networks(DNNs) have made remarkable breakthroughs in several fields such as computer vision, autonomous vehicles etc. Due to its adaptability to malware evolution, security analysts heavily utilise end-to-end DNNs in malware detection systems. Unfortunately, security threats such as adversarial samples cause these classifiers to output erroneous results. These adversarial samples pose major security and privacy risks since a malware detection model will mistakenly label a malware sample as benign. In this paper, we assess the resilience and reliability of our deep learning-based malware detection algorithm. We employed Malconv architecture for malware detection and classification, which was trained using the Microsoft Malware Dataset. We used the Fast Gradient Sign Method (FGSM), a white-box gradient-based attack, to generate adversarial samples for our malware detection model. Based on the performance of our model against this attack, we draw a comparative study between various mitigation techniques such as adversarial training, ensemble methodologies, and defensive distillation in order to analyse how capable they are at solving the problem at hand. Finally, we propose a novel approach - Iterative Distilled Adversarial Training - that combines two of these defence mechanisms, namely adversarial training and defensive distillation, in order to make our model more resilient to an adversarial attack in a white box setting. As a result, we drastically reduced the FGSM attack success rate by around 75% with only a small increase in training time. Additionally, unlike other multi-model defence strategies like ensemble learning, our technique uses one architecture while offering stronger defensive capabilities by relatively decreasing the success rate of attacks by 15%.

Full Text