Model transformation languages are domain-specific languages used to define transformations of models. These transformations consist of the translation from one modeling formalism into another or just the updating of a given model. Such transformations are often described declaratively and are often implemented based on very small models that cover the language of the input model. As a result, transformation developers are often unable to assess the time required to transform a larger model.Hence, we propose a prediction approach based on machine learning which uses a set of model characteristics as input and provides a prediction of the execution time of a transformation defined in the Atlas Transformation Language (ATL). In our previous work (Groner et al., 2023), we already showed that support vector regression in combination with a model characterization based on the number of model elements, the number of references, and the number of attributes is the best choice in terms of usability and prediction accuracy for the transformations considered in our experiments.A major weakness of our previous approach is that it fails to predict the performance of transformations that also transform attribute values of arbitrary length, such as string values. Therefore, we investigate in this work whether an extension of our feature sets that describes the average size of string attributes can help to overcome this weakness.Our results show that the random forest approach in combination with model characterizations based on the number of model elements, the number of references, the number of attributes, and the average size of string attributes filtered by the 85th percentile of their variance is the best choice in terms of the simple way to describe a model and the quality of the obtained prediction. With this combination, we obtained a mean absolute percentage error (MAPE) of 5.07% over all modules and a MAPE of 4.82% over all modules excluding the transformation for which our previous approach failed. Whereas, we obtained previously a MAPE of 38.48% over all modules and a MAPE of 4.45% over all modules excluding the transformation for which our previous approach failed.
Read full abstract