Understanding and Facilitating the Co-Evolution of Production and Test Code

Ming Wen,Yepang Liu,Ying Wang,Sinan Wang,Rongxin Wu

doi:10.1109/saner50967.2021.00033

Abstract

Software products frequently evolve. When the production code undergoes major changes such as feature addition or removal, the corresponding test code typically should co-evolve. Otherwise, the outdated test may be ineffective in revealing faults or cause spurious test failures, which could confuse developers and waste QA resources. Despite its importance, maintaining such co-evolution can be time- and resource-consuming. Existing work has disclosed that, in practice, test code often fails to co-evolve with the production code. To facilitate the co-evolution of production and test code, this work explores how to automatically identify outdated tests. To gain insights into the problem, we conducted an empirical study on 975 open-source Java projects. By manually analyzing and comparing the positive cases, where the test code co-evolves with the production code, and the negative cases, where the co-evolution is not observed, we found that various factors (e.g., the different language constructs modified in the production code) can determine whether the test code should be updated. Guided by the empirical findings, we proposed a machine-learning based approach, SITAR, that holistically considers different factors to predict test changes. We evaluated SITAR on 20 popular Java projects. These results show that SITAR, under the within-project setting, can reach an average precision and recall of 81.4% and 76.1%, respectively, for identifying test code that requires update, which significantly outperforms rule-based baseline methods. SITAR can also achieve promising results under the cross-project setting and multiclass prediction, which predicts the exact change types of test code.

Full Text