Abstract

Adversarial training has been employed by researchers to protect AI models of source code. However, it is still unknown how adversarial training methods in this field compare to each other in effectiveness and robustness. This study surveys and investigates existing adversarial training methods, and conducts experiments to evaluate these neural models’ performance in the domain of source code. First, we examine the process of adversarial training to identify four dimensions that could be used to classify different adversarial training methods into five categories, which are Mixing Directly, Composite Loss, Adversarial Fine-tuning, Min–max + Composite Loss, and Min–max. Second, we conduct empirical evaluations of these classified adversarial training methods under two tasks (i.e., code summarization and code authorship attribution) to determine their performance of effectiveness and robustness. Experimental results indicate that the performance of certain combinations of adversarial training techniques (i.e., min–max with composite loss, or directly-sample with ordinary loss) would be much better than other combinations or other techniques used alone. Our experiments also reveal that the model’s robustness of defensive methods can be enhanced by using diverse input data for adversarial training, and that the number of fine-tuning epochs has little or no impact on model’s performance.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call