Testing is expensive and batching tests has the potential to reduce test costs. The continuous integration strategy of testing each commit or change individually helps to quickly identify faults but leads to a maximal number of test executions. Large companies that have a massive number of commits, e.g., Google and Facebook, or have expensive test infrastructure, e.g., Ericsson, must batch changes together to reduce the number of total test runs. For example, if eight builds are batched together and there is no failure, then we have tested eight builds with one execution saving seven executions. However, when a failure occurs it is not immediately clear which build is the cause of the failure. A bisection is run to isolate the failing build, i.e., the culprit build. In our eight builds example, a failure will require an additional 6 executions, resulting in a saving of one execution. In this work, we re-evaluate batching approaches developed in industry on large open source projects using Travis CI. We also introduce novel batching approaches. In total, we evaluate six approaches. The first is the baseline approach that tests each build individually. The second, is the existing bisection approach. The third uses a batch size of four, which we show mathematically reduces the number of execution without requiring bisection. The fourth combines the two prior techniques introducing a stopping condition to the bisection. The final two approaches use models of build change risk to isolate risky changes and test them in smaller batches. We find that compared to the TestAll baseline, on average, the approaches reduce the number of <i>build test executions</i> across projects by 46, 48, 50, 44, and 49 percent for BatchBisect, Batch4, BatchStop4, RiskTopN, and RiskBatch, respectively. The greatest reduction in executions is BatchStop4 at 50 percent. However, the simple approach of Batch4 does not require bisection and achieves a reduction of 48 percent. In a larger sample of projects, we find that a project’s failure rate is strongly correlated with execution savings (Spearman <inline-formula><tex-math notation="LaTeX">$r = -0.97$</tex-math></inline-formula> with a <inline-formula><tex-math notation="LaTeX">$p \ll 0.001$</tex-math></inline-formula> ). Using Batch4, 85 percent of projects see savings. All projects that have build failures less than 40 percent of the time will benefit from batching. In terms of <i>feedback time</i> , compared to TestAll, we find that BatchBisect, Batch2, Batch4, BatchStop4 all reduce the average feedback time by 33, 16, 32, and 37 percent. Simple batching saves not only resources but also reduces feedback time without introducing any slip-throughs and without changing the test run order. We suggest that most projects should adjust their CI pipelines to use a batch size of at least two. We release our scripts and data for replication <sup>1</sup> as well as the <monospace>BatchBuilder</monospace> tool <sup>2</sup> that automatically batches submitted changes on GitHub for testing on Travis CI. Since the tool reports individual results for each pull-request or pushed commit, the batching happens in the background and the development process is unchanged.
Read full abstract