Rigorous benchmarking in reasonable time

Tomas Kalibera,Richard Jones

doi:10.1145/2555670.2464160

Abstract

Experimental evaluation is key to systems research. Because modern systems are complex and non-deterministic, good experimental methodology demands that researchers account for uncertainty. To obtain valid results, they are expected to run many iterations of benchmarks, invoke virtual machines (VMs) several times, or even rebuild VM or benchmark binaries more than once. All this repetition costs time to complete experiments. Currently, many evaluations give up on sufficient repetition or rigorous statistical methods, or even run benchmarks only in training sizes. The results reported often lack proper variation estimates and, when a small difference between two systems is reported, some are simply unreliable. In contrast, we provide a statistically rigorous methodology for repetition and summarising results that makes efficient use of experimentation time. Time efficiency comes from two key observations. First, a given benchmark on a given platform is typically prone to much less non-determinism than the common worst-case of published corner-case studies. Second, repetition is most needed where most uncertainty arises (whether between builds, between executions or between iterations). We capture experimentation cost with a novel mathematical model, which we use to identify the number of repetitions at each level of an experiment necessary and sufficient to obtain a given level of precision. We present our methodology as a cookbook that guides researchers on the number of repetitions they should run to obtain reliable results. We also show how to present results with an effect size confidence interval. As an example, we show how to use our methodology to conduct throughput experiments with the DaCapo and SPEC CPU benchmarks on three recent platforms.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Rigorous benchmarking in reasonable time

Abstract

Talk to us

Similar Papers

More From: ACM SIGPLAN Notices

Lead the way for us

Journal: ACM SIGPLAN Notices	Publication Date: Jun 20, 2013
Citations: 15

Similar Papers

Rigorous benchmarking in reasonable time
Tomas Kalibera ... Richard Jones
-
Tomas Kalibera, et. al.Tomas Kalibera ... Richard Jones
20 Jun 2013
20 Jun 2013

Rigorous benchmarking in reasonable time
Tomas Kalibera ... Richard Jones
-
Tomas Kalibera, et. al.Tomas Kalibera ... Richard Jones
20 Jun 2013
20 Jun 2013

Estimation of sampling variance of molecular marker data using the bootstrap procedure
J G Tivang ... J Nienhuis
Theoretical and Applied Genetics | VOL. 89-89
J G Tivang, et. al.J G Tivang ... J Nienhuis
01 Oct 1994
Theoretical and Applied Genetics | VOL. 89-89

Multiday Samples, Parameter Estimation Precision, and Data Collection Costs for Least Squares Regression Trip-Generation Models
E I Pas
Environment and Planning A: Economy and Space | VOL. 18
E I PasE I Pas
01 Jan 1986
Environment and Planning A: Economy and Space | VOL. 18

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Rigorous benchmarking in reasonable time

Abstract

Talk to us

Similar Papers

More From: ACM SIGPLAN Notices