Abstract

When evaluating Static Application Security Testing (SAST) tools, benchmarks based on real-world softwares are considered more representative than synthetic micro benchmarks. Generated from real-world software, the test cases in such kind of benchmarks usually contain multiple syntactic features which affect the vulnerability detection results reflecting SAST tools' capabilities in real-world settings. However, most existing benchmarks based on real-world software pay little attention to these syntactic features so that only limited information about the capabilities of SAST tools can be obtained from the evaluation results. In this paper, we provide a method of constructing benchmarks and evaluating SAST tools, which leverages the syntactic features to support the evaluation to be more explainable. To demonstrate the effectiveness, we applied our method to the benchmark built by Misha Zitser et al., generated 10 groups of test cases, and evaluated 2 SAST tools with them. The result shows that, with the benchmark constructed by our method, the evaluation could be more explainable which helps us to gain more information about the SAST tools' capabilities of vulnerability detection.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call