Assessing and enhancing software testing effectiveness

Phyllis G Frankl

doi:10.1145/340855.340888

Abstract

Although many techniques for testing software have been proposed over the last twenty years, there is still not enough solid evidence to indicate which (if any) of these techniques are effective. It is difficult to perform meaningful comparisons of the cost and effectiveness of testing techniques; in fact, even defining these terms in a meaningful way is problematic. Consider an erroneous program P, its specification S, and a test data adequacy criterion C (such as 100% branch coverage). Even if we restrict the size of the test sets to be considered, there are a huge number of different test sets that satisfy criterion C for P and S. Since these adequate test sets typically have different properties, in order to investigate effectiveness (or other properties) rigorously, the entire space of test sets must be considered (according to some reasonable probability distribution) and appropriate probabilistic analysis and/or statistical sampling techniques must be used.In earlier research, supported by NSF Grant CCR-9206910, we developed analytical tools and an experiment design to address these issues and applied them to comparing a number of well-known testing techniques. The primary measure of effectiveness considered was probability that an adequate test set would detect at least one fault and the most of the experiment subjects were fairly small. The main thread of this research project extends that work in several directions: additional measures of cost and effectiveness are considered, analytical and experimental tools are developed for these measures, and experiments are conducted on larger programs.

Full Text