Abstract

In the era of big data, mining and analysis of the enormous amount of data has been widely used to support decision-making. This complex process including huge-volume data collecting, storage, transmission, and analysis could be modeled as workflow. Meanwhile, cloud environment provides sufficient computing and storage resources for big data management and analytics. Due to the clouds providing the pay-as-you-go pricing scheme, executing a workflow in clouds should pay for the provisioned resources. Thus, cost-effective resource provisioning for workflow in clouds is still a critical challenge. Also, the responses of the complex data management process are usually required to be real-time. Therefore, deadline is the most crucial constraint for workflow execution. In order to address the challenge of cost-effective resource provisioning while meeting the real-time requirements of workflow execution, a resource provisioning strategy based on dynamic programming is proposed to achieve cost-effectiveness of workflow execution in clouds and a critical-path based workflow partition algorithm is presented to guarantee that the workflow can be completed before deadline. Our approach is evaluated by simulation experiments with real-time workflows of different sizes and different structures. The results demonstrate that our algorithm outperforms the existing classical algorithms.

Highlights

  • Nowadays, the big data technology has been used in a wide range of applications including complex systems to support decision-making [1, 2]

  • Since the performance of cost optimization for larger size and longer critical path is better and more stable, we take a workflow with 1000 tasks with the length of the critical path being 200 as the simulation instance to compare the performances of three cost optimization algorithms: GRP4RW, IC-PCP, and Dynamic Programming Knapsack Algorithm (DPK)

  • We modeled the problem as an optimization problem that aims to get a cost-effective resource provisioning to executive a workflow while meeting a deadline constraint. e problem was solved by using the dynamic programming knapsack algorithm

Read more

Summary

Introduction

The big data technology has been used in a wide range of applications including complex systems to support decision-making [1, 2]. Along with the enormous commercial benefits, scientific advances, management efficiency, and analytical accuracy brought by big data, this new technology raises many challenging problems such as high cost and latency of big data storage, transmission, and processing [3,4,5] To tackle these problems, cloud computing environment and workflow modeling methods are recognized as the effective way. At is, the second stage determines where and when each task of a workflow will be executed, while the first stage decides what types and how many resources will be leased from the cloud service providers and the total cost of the workflow is mainly decided at this stage To address these distinctions between task scheduling and resource provisioning, we propose in this paper a novel cost optimization algorithm that focuses only on resource provisioning.

Related Work
Related Model
Cost-Effective Resource Provisioning
Cost-Effective Resource Provisioning Algorithm
Performance Evaluation
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call