ABSTRACTMeasurement‐driven system development focuses on using quantitative data to evaluate capabilities, benefits, progress, and tradeoffs as well as identify improvement opportunities. This paper describes a controlled study that addresses software testing effectiveness and focuses on the combination of individual testing techniques into team‐based testing strategies. This analysis is intended to enable measurement‐driven process improvement by characterizing how testing effectiveness relates to several factors, including testing strategy, software type, and developer expertise. In this study, a representative group of software development professionals applied common testing techniques to different types of software. This study compares the six possible team combinations of three testing techniques: (1) code reading by stepwise abstraction, (2) functional testing using equivalence partitioning and boundary value analysis, and (3) structural testing using 100% statement coverage criteria. Thirty‐two professional developers applied the techniques to three unit‐sized programs in a fractional factorial experimental design.The major results of this study are the following. The six combined testing strategies detected 17% more of the programs' faults on the average than did the three single techniques, which was a 35% improvement in fault detection. The highest percentages of the programs' faults were detected when there was a combination of either two code readers or a code reader and a functional tester. However, a pairing of two code readers detected more faults per hour than did a pairing of a code reader and a functional tester. The pairing of two individuals of advanced expertise resulted in the highest percentage of faults being detected. The most cost‐effective (number of faults detected per hour) testing strategy overall was when code reading was applied by an individual. The most cost‐effective combined testing strategy was when a code reader was paired with either another code reader or a structural tester. Both the percentage of faults detected and the fault detection cost‐effectiveness depended on the type of software being tested. In conclusion, we outline future research directions that build on these strategies and ideas.
Read full abstract