We propose a novel methodology to validate software product line (PL) models by integrating Statistical Model Checking (SMC) with Process Mining (PM). We consider the feature-oriented language QFLan from the PL engineering domain. QFLan allows to model PL equipped with rich cross-tree and quantitative constraints, as well as aspects of dynamic PLs such as the staged configurations. This richness allows us to easily obtain models with infinite state-space, calling for simulation-based analysis techniques, like SMC. For example, we use a running example with infinite state space. SMC is a family of analysis techniques based on the generation of samples of the dynamics of a system. SMC aims at estimating properties of a system like the probability of a given event (e.g., installing a feature), or the expected value of quantities in it (e.g., the average price of products from the studied family). Instead, PM is a family of data-driven techniques that uses logs collected on the execution of an information system to identify and reason about its underlying execution process. This often regards identifying and reasoning about process patterns, bottlenecks, and possibilities for improvement. In this paper, to the best of our knowledge, we propose, for the first time, the application of Process Mining (PM) techniques to the byproducts of Statistical Model Checking (SMC) simulations. This aims to enhance the utility of SMC analyses. Typically, if SMC gives unexpected results, the modeler has to discover whether these come from actual characteristics of the system, or from bugs in the model. This is done in a black-box manner, only based on the obtained numerical values. We improve on this by using PM to get a white-box perspective on the dynamics of the system observed by SMC. Roughly speaking, we feed the samples generated by SMC to PM tools, obtaining a compact graphical representation of the observed dynamics. This mined PM model is then transformed into a mined QFLan model, making it accessible to PL engineers. Using two well-known PL models, we show that our methodology is effective (helps in pinpointing issues in models, and in suggesting fixes), and that it scales to complex models. We also show that it is general, by applying it to the security domain.
Read full abstract