Abstract

Recent advances in machine learning offer attractive tools for sophisticated adversaries. An attacker could transform malware into its adversarial version but retain its malicious functionality by employing a dedicated perturbation method. These adversarial malware examples have demonstrated the effectiveness to bypass antivirus engines. However, recent works only leverage a single perturbation method to generate adversarial examples, which cannot defeat advanced detectors. In this paper, we propose a reinforcement learning-based framework called MalInfo, which could generate powerful adversarial malware examples to evade the third-party detectors via an adaptive selection of a perturbation path for each malware in our collected dataset with 1000 diverse malware. To cope with limited computation, MalInfo applies either dynamic programming or temporal difference learning to choose the optimal perturbation path where each path is formed by the combination of Obfusmal, Stealmal, and Hollowmal. We provide a proof-of-concept implementation and extensive evaluation of our framework. Both the detection rate and evasive rate have substantially been improved compared with the state-of-art research MalFox Zhong et al. (2021). To be specific, The average detection rates for dynamic programming and temporal difference learning are 23.2% (21.9% lower than MalFox) and 27.5% (7.4% lower than MalFox), respectively, and the average evasive rates are 65.8% (17.1% higher than MalFox) and 59.4% (5.7% higher than MalFox), respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call