Applying test case prioritization to software microbenchmarks

Christoph Laaber,Philipp Leitner,Harald C. Gall

doi:10.1007/s10664-021-10037-x

Abstract

Regression testing comprises techniques which are applied during software evolution to uncover faults effectively and efficiently. While regression testing is widely studied for functional tests, performance regression testing, e.g., with software microbenchmarks, is hardly investigated. Applying test case prioritization (TCP), a regression testing technique, to software microbenchmarks may help capturing large performance regressions sooner upon new versions. This may especially be beneficial for microbenchmark suites, because they take considerably longer to execute than unit test suites. However, it is unclear whether traditional unit testing TCP techniques work equally well for software microbenchmarks. In this paper, we empirically study coverage-based TCP techniques, employing total and additional greedy strategies, applied to software microbenchmarks along multiple parameterization dimensions, leading to 54 unique technique instantiations. We find that TCP techniques have a mean APFD-P (average percentage of fault-detection on performance) effectiveness between 0.54 and 0.71 and are able to capture the three largest performance changes after executing 29% to 66% of the whole microbenchmark suite. Our efficiency analysis reveals that the runtime overhead of TCP varies considerably depending on the exact parameterization. The most effective technique has an overhead of 11% of the total microbenchmark suite execution time, making TCP a viable option for performance regression testing. The results demonstrate that the total strategy is superior to the additional strategy. Finally, dynamic-coverage techniques should be favored over static-coverage techniques due to their acceptable analysis overhead; however, in settings where the time for prioritzation is limited, static-coverage techniques provide an attractive alternative.

Highlights

Regression testing approaches assist developers to uncover faults in new software versions, compared to previous versions
This paper presents the first investigation on whether standard test case prioritization (TCP) techniques from unit testing research are applicable in the context of software microbenchmarks
The unique combinations of these independent variables results in 54 different TCP techniques, which we evaluated on a large Java Microbenchmark Harness (JMH) data set comprising 10 Java open-source software (OSS) projects, across 161 versions, having 1,829 distinct microbenchmarks with 6,460 distinct parameterizations

Summary

Introduction

Regression testing approaches assist developers to uncover faults in new software versions, compared to previous versions. One such approach is test case prioritization (TCP): Extended author information available on the last page of the article. Empir Software Eng (2021) 26: 133 it reorders tests to execute the most important ones firsts, to find faults sooner on average. The unit-testing-equivalent technique for testing performance is software microbenchmarking. Software microbenchmarks take substantially longer to execute, often taking multiple hours or even days (Huang et al 2014; Stefan et al 2017; Laaber and Leitner 2018), which is a compelling reason to apply TCP to capture important performance changes sooner. The focus has been on predicting the performance impact of code changes on commits to decide whether performance tests should be run at all (Huang et al 2014; Sandoval Alcocer et al 2016), on prioritizing microbenchmarks according to the expected performance change size (Mostafa et al 2017), or on selecting microbenchmarks that are most likely to detect a performance regression (de Oliveira et al 2017; Alshoaibi et al 2019; Chen et al 2020)

Objectives

Results

Discussion

Conclusion