As new software-testing techniques are developed, before they can achieve widespread acceptance, their effectiveness at detecting defects must be evaluated. The most common way of evaluating testing techniques is with empirical studies, in which one or more techniques are tried out on software with known defects. However, the defects used can affect the performance of the techniques. To complicate matters, it is not even clear how to effectively describe or characterize defects. To address these problems, this article describes an experiment architecture for empirically evaluating testing techniques which takes both defect and test-suite characteristics into account. As proof of concept, an experiment on GUI-testing techniques is conducted. It provides evidence that the defect characteristics proposed do help explain defect detection, at least for GUI testing, and it explores the relationship between the coverage of defective code and the detection of defects.
Read full abstract