Predicting runtime of computational jobs in distributed computing environment

A.G Feoktistov,O.Yu Basharina

doi:10.47350/iccs-de.2020.10

Abstract

The paper addresses a relevant problem of predicting the runtime of jobs for executing problem-solving schemes of large-scale applications in a heterogeneous distributed computing environment. Such an environment includes nodes that have various hardware architectures, different system software, and diverse computational possibilities. We believe that increasing the accuracy in predicting the runtime of jobs can significantly improve the efficiency of problem-solving and rational use of resources in the heterogeneous environment. To this end, we propose new models that make it possible to take into account various estimations of the module runtime for all modules included in the problem-solving scheme. These models were developed using the special computational model of distributed applied software packages (large-scale scientific applications). In addition, we compare the prediction results (jobs runtime and their errors) using different estimations. Among them are the estimations obtained through the modules testing, users estimations, and estimations based on computational history. These results were obtained in continuous integration, delivery, and deployment of applied and system software of a package for solving warehouse logistics problems. They show that the largest accuracy is achieved by the modules testing.

Highlights

Todays, scientific applications focus on carrying out large-scale scientific experiments in a heterogeneous distributed computing environment
Error of the prediction based on the computational history. These results show that the error in the job runtime predicted based on the module testing decreases with increasing the data size in the both cases
The rational allocation of resources in solving large problems in a heterogeneous distributed computing environment depends on the effectiveness of job maintenance schedules planned by local resource managers (LRMs)

Summary

Introduction

Scientific applications focus on carrying out large-scale scientific experiments in a heterogeneous distributed computing environment. In the environment with virtualized resources, the estimate Evs of the job runtime in the asynchronous mode is defined as follows: Evs max i=1̅̅,̅keiτ, eiτ = eiq + eivml + ei + eivmr + ∀j∈1̅̅,̅km:paijx=1,i≠j(ejτ + u). In the asynchronous mode with restarting modules, the estimate Ers of the job runtime is defined as follows: Ers max i=1̅̅,̅keiτ, eiτ = eiq + eif + ei + eirun + eires + ∀j∈1̅̅,̅km:paijx=1,i≠j(ejτ + u).

Results

Conclusion