VDTriplet: Vulnerability detection with graph semantics using triplet model

Hao Sun,Lei Cui,Lun Li,Zhenquan Ding,Siyuan Li,Zhiyu Hao,Hongsong Zhu

doi:10.1016/j.cose.2024.103732

Abstract

This study presents VDTriplet, a novel learning framework for building vulnerability detection models. VDTriplet is the first attempt using deep learning to avoid the potential known vulnerability function misjudgment due to the small difference between vulnerability and its fixed vulnerability function. Unlike prior work that treats the program as sequential tokens or randomly initialized graphs for supervised binary classification detection tasks, our model not only fuses rich syntactic and semantic information to obtain the most accurate program representation, but also utilizes the TripletNN model to reduce misjudgment of potential known vulnerabilities. VDTriplet first extracts the subgraphs that causes the vulnerability through the typical programming errors to reduce redundant code. Then, it uses the pre-trained model and unsupervised model for the graph encoding of subgraphs, thereby minimizing the influence of randomly initialized graph nodes and avoiding the need for supervised labeling. Finally, TripletNN model minimizes the distance between potential vulnerabilities and vulnerabilities with the same vulnerability type, and maximizes the distance between potential vulnerabilities and fixed vulnerabilities to reduce false positives. The results show that the performance of VDTriplet is significantly better than the studied baselines. Compared with the best performing model in the literature, our model achieves a total of 4.89%, 4.23%, 4.56% and 5.34% improvement in Accuracy, Precision, Recall and F1-Score in the test results respectively. Moreover, it exhibits well generalization in detecting new eight applications, demonstrating that it is potentially valuable in practical usage. Overall, this is indeed an outstanding improvement.

Full Text