Abstract
The anti-detection capabilities of adversarial malware examples have drawn the attention of antivirus vendors and researchers. In black-box scenarios where internal information of the target model cannot be accessed, with the ability of the reinforcement learning (RL) agents to adjust strategies based on the feedback from the environment, existing RL-based methods enable evasion of Windows PE malware detectors. However, obtaining evasion rewards as positive feedback in the black-box setting proves challenging, resulting in low training efficiency. To address this issue, we introduce the intrinsic curiosity reward into the framework to motivate the agent to explore unknown state spaces and learn effective evasion strategies. Additionally, we employ the generative adversarial network (GAN) to obtain varying synthetic data, which replaces random or benign bytes as adversarial payloads for the agent's action content, improving attack capabilities and reducing the risk of hard-coded adversarial perturbations being anchored. We compare the attack performance with other RL-based baseline methods, and experimental results show that our framework is more flexible and effective, achieving a 63%-85% attack success rate against EMBER, FireEye and MalConv. Even if defensive measures are taken, the proposed method still has a certain attack capability, and the success rate remains between 48%-67%.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.