Exploring Semantic Redundancy using Backdoor Triggers: A Complementary Insight into the Challenges facing DNN-based Software Vulnerability Detection

Changjie Shao,Jun Wu,Xi Zheng,Gaolei Li

doi:10.1145/3640333

Abstract

To detect software vulnerabilities with better performance, deep neural networks (DNNs) have received extensive attention recently. However, these vulnerability detection DNN models trained with code representations are vulnerable to specific perturbations on code representations. This motivates us to rethink the bane of software vulnerability detection and find function-agnostic features during code representation which we name as semantic redundant features. This paper first identifies a tight correlation between function-agnostic triggers and semantic redundant feature space (where the redundant features reside) in these DNN models. For correlation identification, we propose a novel Backdoor-based Semantic Redundancy Exploration (BSemRE) framework. In BSemRE, the sensitivity of the trained models to function-agnostic triggers is observed to verify the existence of semantic redundancy in various code representations. Specifically, acting as the typical manifestations of semantic redundancy, naming conventions, ternary operators and identically-true conditions are exploited to generate function-agnostic triggers. Extensive comparative experiments on 1613823 samples of 8 representative vulnerability datasets and state-of-the-art code representation techniques and vulnerability detection models demonstrate that the existence of semantic redundancy determines the upper trustworthiness limit of DNN-based software vulnerability detection. To the best of our knowledge, this is the first work exploring the bane of software vulnerability detection using backdoor triggers.

Full Text