Abstract

Effective planning and optimized execution of the e-Science workflows in distributed systems, such as the Grid, need predictions of execution times of the workflows. However, predicting the execution times of e-Science workflows in heterogeneous distributed systems is a challenging job due to the complex structure of workflows, variations due to input problem-sizes, and heterogeneous and dynamic nature of the shared resources. To this end, we propose two novel workflow execution time-prediction methods based on the machine learning ensemble models. In this paper, we showcase our approach for two different real Grid environments. Our approach can effectively predict the execution time of the scientific workflow applications in the Grid for various problem sizes, Grid sites, and runtime environments. We characterized the workflow performance in the Grid using the attributes that define structure of workflow as well as the execution environment. Contrary to common ensembles, our ensemble systems employed three strong learners, which balance the weaknesses of each other by their strengths to model the workflow execution times. The proposed methods have been thoroughly evaluated for three real-world e-science workflow applications. The experimental results demonstrated that our proposed multi-model ensemble models can significantly decrease the prediction error (by 50%, on average) as compared with methods based on the radial basis function neural network, local learning, and performance templates. The proposed methods can also be applied with similar effectiveness and without any major modification for other heterogeneous distributed environments, such as the Cloud.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call