CSVD-TF: Cross-project software vulnerability detection with TrAdaBoost by fusing expert metrics and semantic metrics

Zhilong Cai,Guilong Lu,Wenlong Pei,Junjie Zhao,Xiang Chen,Yongwei Cai

doi:10.1016/j.jss.2024.112038

Abstract

Recently, deep learning-based software vulnerability detection (SVD) approaches have achieved promising performance. However, the scarcity of high-quality labeled SVD data influences the practicality of these approaches. Therefore, cross-project software vulnerability detection (CSVD) has gradually attracted the attention of researchers since CSVD can utilize the labeled SVD data from the source project to construct an effective CSVD model for the target project via transfer learning. However, if a certain number of program modules in the target project can be labeled by security experts, it can help to improve CSVD model performance by effectively utilizing similar SVD data in the source project. For this more practical CSVD scenario, we propose a novel approach CSVD-TF via the transfer learning method TrAdaBoost. Moreover, we find expert metrics and semantic metrics extracted from the functions show a certain complementary in our investigated scenario. Therefore, we utilize a model-level metric fusion method to further improve the performance. We perform a comprehensive study to evaluate the effectiveness of CSVD-TF on four real-world projects. Our empirical results show that CSVD-TF can achieve performance improvements of 7.5% to 24.6% in terms of AUC when compared to five state-of-the-art baselines.

Full Text