Abstract

In hybrid cloud environments, reasonable data placement strategies are critical to the efficient execution of scientific workflows. Due to various loads, bandwidth fluctuations, and network congestions between different data centers as well as the dynamics of hybrid cloud environments, the data transmission time is uncertain. Thus, it poses huge challenges to the efficient data placement for scientific workflows. However, most of the traditional solutions for data placement focus on deterministic cloud environments, which lead to the excessive data transmission time of scientific workflows. To address this problem, we propose an adaptive discrete particle swarm optimization algorithm based on the fuzzy theory and genetic algorithm operators (DPSO-FGA) to minimize the fuzzy data transmission time of scientific workflows. The DPSO-FGA can rationally place the scientific workflow data while meeting the requirements of data privacy and the capacity limitations of data centers. Simulation results show that the DPSO-FGA can effectively reduce the fuzzy data transmission time of scientific workflows in hybrid cloud environments.

Highlights

  • With the widespread applications of Big Data technologies, the amount of data generated by modern network environments is greatly increasing. erefore, traditional distributed computing modes such as grid computing may not meet the requirements of massive data processing

  • From the perspective of algorithms, the DPSO-FGA outperforms the constraint fuzzy randomized algorithm (CFRA) and constraint fuzzy greedy algorithm (CFGA). is is because that the CFGA may fall into the local optimum by using the genetic algorithm (GA) during execution, and it ignores the global performance

  • The overall performance of the CFRA is better than CFGA since the search space of the CFRA is larger than the CFGA and will not fall into the local optimum, and the CFRA can obtain a good solution when the algorithm runs for a long time

Read more

Summary

Introduction

With the widespread applications of Big Data technologies, the amount of data generated by modern network environments is greatly increasing. erefore, traditional distributed computing modes such as grid computing may not meet the requirements of massive data processing. Based on the analytic hierarchy process (AHP) model, a data placement strategy was proposed in [28] to select the most suitable storage sites, which applied the fuzzy comprehensive evaluation to candidate data centers for different users They did not involve the data placement problem of scientific workflows, and their fuzzy object and optimization goal was not the data transmission time. (ii) Based on the problem definitions and modeling, the DPSO-FGA is proposed as the second contribution for reducing the fuzzy data transmission time while considering the uncertainty of data transmission time, the different numbers and capacities of private data centers, and network bandwidth limitations, which can well adapt to real-world network environments.

Problem Definitions
Effective Data Placement for Scientific Workflows Based on DPSO-FGA
17: Find the data centers dsi in the placement of task tj’s input dataset Ij
Performance Evaluation
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.