Generating Adversarial Source Programs Using Important Tokens-based Structural Transformations

Penglong Chen,Zhen Li,Yu Wen,Lili Liu

doi:10.1109/iceccs54210.2022.00029

Abstract

Deep learning models have been widely used in source code processing tasks, such as code captioning, code summarization, code completion, and code classification. Recent studies have shown that deep learning-based source code processing models are vulnerable. Attackers can generate adversarial examples by adding perturbations to source programs. Existing attack methods perturb a source program by renaming one or multiple variables in the program. These attack methods do not take into account the perturbation of the equivalent structural transformations of the source code. We propose a set of program transformations involving identifier renaming and structural transformations, which can ensure that the perturbed program retains the original semantics but can fool the source code processing model to change the original prediction result. We propose a novel method of applying semantics-preserving structural transformations to attack the source program pro-cessing model in the white-box setting. This is the first time that semantics-preserving structural transformations are applied to generate adversarial examples of source code processing models. We first find the important tokens in the program by calculating the contribution values of each part of the program, then select the best transformation for each important token to generate semantic adversarial examples. The experimental results show that the attack success rate of our attack method can improve 8.29 % on average compared with the state-of-the-art attack method; adversarial training using the adversarial examples generated by our attack method can reduce the attack success rates of source code processing models by 21.79% on average.

Full Text