Software often needs to exist in different variants, which account for varying customer requirements, environments, or non-functional aspects, such as energy consumption. Unfortunately, the number of variants can grow exponentially with the number of features. As such, developing and evolving variant-rich systems is challenging, since they do not only evolve “in time” as single systems, but also “in space” with new variants. Fortunately, many different methods and tools for variant-rich systems have been proposed over the last decades, especially in the field of software product line engineering. However, their level of evaluation varies significantly, threatening their relevance for practitioners and that of future research. Many tools have only been evaluated on ad hoc datasets, minimal examples, or unrealistic and limited evolution scenarios, missing large parts of the actual evolution lifecycle of variant-rich systems.Our long-term goal is to provide benchmarks to increase the maturity of evaluation of methods and tools for evolving variant-rich systems. However, providing manually curated and sufficiently detailed benchmarks that cover the whole evolution lifecycle of variant-rich systems is challenging. We present the framework vpbench, which simulates the evolution of a variant-rich system and thereby generates an evolution enriched with metadata explaining the evolution. The generated benchmarks, i.e., the evolution histories and metadata, can serve as ground truth to check the results of tools applied on it. We formalize the claims we make about the generator and the generated benchmarks as requirements. The design of vpbench comprises modular generators and evolution operators that automatically evolve real codebases. We implement simple and advanced evolution operators—e.g., relying on code transplantation to incorporate features from real projects. We demonstrate how vpbench addresses its claimed requirements, also considering multiple degrees of realism, extensibility and language-independence of the generated benchmarks.
Read full abstract