Abstract

We show that a widely used benchmark set for the comparison of static”=analysis tools exhibits an impressive number of weaknesses, and that the internationally accepted quantitative”=evaluation metrics may lead to useless results. The weaknesses in the benchmark set were identified by applying a sound static analysis to the programs in this set and carefully interpreting the results. We propose how to deal with weaknesses of the quantitative metrics and how to improve such benchmarks and the evaluation process, in particular for external evaluations, in which an ideally neutral institution does the evaluation, whose results potential clients can trust. We also show that sufficiently high quality of the test cases makes an automatic result evaluation possible.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call