The arms race: Adversarial search defeats entropy used to detect malware

Héctor D Menéndez,Sukriti Bhattacharya,David Clark,Earl T Barr

doi:10.1016/j.eswa.2018.10.011

Abstract

Malware creators have been getting their way for too long now. String-based similarity measures can leverage ground truth in a scalable way and can operate at a level of abstraction that is difficult to combat from the code level. At the string level, information theory and, specifically, entropy play an important role related to detecting patterns altered by concealment strategies, such as polymorphism or encryption. Controlling the entropy levels in different parts of a disk resident executable allows an analyst to detect malware or a black hat to evade the detection. This paper shows these two perspectives into two scalable entropy-based tools: EnTS and EEE. EnTS, the detection tool, shows the effectiveness of detecting entropy patterns, achieving 100% precision with 82% accuracy. It outperforms VirusTotal for accuracy on combined Kaggle and VirusShare malware. EEE, the evasion tool, shows the effectiveness of entropy as a concealment strategy, attacking binary-based state of the art detectors. It learns their detection patterns in up to 8 generations of its search process, and increments their false negative rate from range 0–9%, up to the range 90–98.7%.

Highlights

Arms races alternate between incremental and disruptive moves like the stockpiling of armaments and the invention of airplanes
Entropy Time Series (EnTS) is more accurate than compression rate (CR) and Structural Entropy (SEnt), and similar to Normalised Compression Distance (NCD)
After comparing EnTS packing detection abilities with the other techniques, we discovered that EnTS is more accurate than CR and SEnt and similar to NCD

Summary

Introduction

Arms races alternate between incremental and disruptive moves like the stockpiling of armaments and the invention of airplanes. The malware detection/evasion arms race is no exception. Its history exhibits periods of minor moves and counter-moves like tweaking malware to avoid known signature of disruptive moves like the transition to polymorphic concealment. Our core contribution is to show how to use search to restrict the adversary to only making disruptive moves. Given an evasion or detection technique, we use machine learning to search for transformations that produce variants that force the adversary to make expensive, disruptive moves. The specific detection and evasion techniques we consider use information theoretic entropy

Objectives

Results

Discussion

Conclusion