Abstract

Cloud computing has established itself as a solid computational model that allows for scientists to use a series of distributed virtual resources to execute a wide range of scientific experiments. In several cases, there is a demand for high performance in executing these experiments since many activities are data and computing intensive. Parallelism techniques are a key issue in this experimentation process. There are approaches that provide parallelism capabilities for scientific workflows in clouds. However, most of them rely on the scientist to dimension the virtual cluster to be instantiated. Dimensioning the virtual cluster to execute the workflow in parallel may be a hard task to accomplish, i.e. it is hard to define and adapt the optimal number of virtual machines to be used. Most systems follow this manual configuration of the scientist for the whole workflow execution, using adaptive techniques only in the presence of failures. Due to the huge number of options (virtual machine types) to configure a cloud environment, the configuration task commonly becomes impractical to be performed manually, and if it is not adjusted adaptively during the execution, it can impact negatively on workflow performance, or it can produce excessive increase in financial cost. This paper proposes a service called SciDim which is based on the use of a multi-objective cost function allied to genetic algorithms and provenance data to help determining an ideal initial configuration for the virtual cluster, under budget and deadline constraints set by the scientist

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call