This work aims to estimate the execution time of data processing tasks (specific executions of a program or an algorithm) before their execution. The paper focuses on the estimation of the average-case execution time (ACET). This metric can be used to predict the approximate cost of computations, e.g. when resource consumption in a High-Performance Computing system has to be known in advance. The presented approach proposes to create machine learning models using historical data. The models use program metadata (properties of input data) and parameters of the run-time environment as their explanatory variables. Moreover, the set of these variables can be easily expanded with additional parameters of the specific programs. The program code itself is treated as a black box. The response variable of the model is the execution time. The models have been validated within a Large-Scale Computing system that allows for a unified treatment of programs as computation modules. We present the process of training and validation for several different computation modules and discuss the suitability of the proposed models for ACET estimation in various computing environments.
Read full abstract