Modeling the performance of a highly configurable software system requires capturing the influences of its configuration options and their interactions on the system’s performance. Performance-influence models quantify these influences, explaining this way the performance behavior of a configurable system as a whole. To be useful in practice, a performance-influence model should have a low prediction error, small model size, and reasonable computation time. Because of the inherent tradeoffs among these properties, optimizing for one property may negatively influence the others. It is unclear, though, to what extent these tradeoffs manifest themselves in practice, that is, whether a large configuration space can be described accurately only with large models and significant resource investment. By means of 10 real-world highly configurable systems from different domains, we have systematically studied the tradeoffs between the three properties. Surprisingly, we found that the tradeoffs between prediction error and model size and between prediction error and computation time are rather marginal. That is, we can learn accurate and small models in reasonable time, so that one performance-influence model can fit different use cases, such as program comprehension and performance prediction. We further investigated the reasons for why the tradeoffs are marginal. We found that interactions among four or more configuration options have only a minor influence on the prediction error and that ignoring them when learning a performance-influence model can save a substantial amount of computation time, while keeping the model small without considerably increasing the prediction error. This is an important insight for new sampling and learning techniques as they can focus on specific regions of the configuration space and find a sweet spot between accuracy and effort. We further analyzed the causes for the configuration options and their interactions having the observed influences on the systems’ performance. We were able to identify several patterns across subject systems, such as dominant configuration options and data pipelines, that explain the influences of highly influential configuration options and interactions, and give further insights into the domain of highly configurable systems.
Read full abstract