Regression test case reduction aims at selecting a representative subset from the original test pool, while retaining the largest possible fault detection capability. Cluster analysis has been proposed and applied for selecting an effective test case subset in regression testing. It groups test cases into clusters based on the similarity of historical execution profiles. In previous studies, historical execution profiles are represented as binary or numeric function coverage vectors. The vector-based similarity approaches only consider which functions or statements are covered and the number of times they are executed. However, the vector-based approaches do not take the relations and sequential information between function calls into account. In this paper, we propose cluster analysis of function call sequences to attempt to improve the fault detection effectiveness of regression testing even further. A test is represented as a function call sequence that includes the relations and sequential information between function calls. The distance between function call sequences is measured not only by the Levenshtein distance but also the Euclidean distance. To assess the effectiveness of our approaches, we designed and conducted experimental studies on five subject programs. The experimental results indicate that our approaches are statistically superior to the approaches based on the similarity of vectors (i.e. binary vectors and numeric vectors), random and greedy function-coverage-based maximization test case reduction techniques in terms of fault detection effectiveness. With respective to the cost-effectiveness, cluster analysis of sequences measured using the Euclidean distance is more effective than using the Levenshtein distance.
Read full abstract