What makes a real change in software performance? An empirical study on analyzing the factors that affect the triagement of performance change points

Jie Chen,Ziyuan Zhang,Dongjin Yu,Haiyang Hu

doi:10.1016/j.scico.2023.103068

Abstract

Performance tests can be automatically done on each commit version with a continuous integration system. After code changes are pushed to the repository, multiple benchmark tests are run to measure system's performance. Change point detection technology analyzes all these outcomes to identify commits that significantly change performance automatically. However, a considerable number of automatically detected change points are tagged as not being actionable (false positive). Typically, validating potential change points by hand is difficult, time-consuming, and thus creates a bottleneck in the testing process.Our work focuses on identifying what factors affect performance change point triagement and triaging whether a newly detected change point is true positive or not automatically. We start by extracting 34 features from four dimensions i.e., Configuration, Time Series, Version, and Context. We utilize a random forest classifier to triage the change points based on the proposed features. The results indicate that most of the proposed features differ significantly between true positive and false positive change points. On average, our model obtains an AUC of 0.881, which is statistically considerably better than two state-of-the-art approaches. We also look into the most important features for triagement and discover that version related features are the most essential factors.

Full Text