Abstract

With the development of content-sharing and collaborative computing services such as online social networks, scientific workflow, there are huge amounts of data generated. To process this tremendous amount of data, multi-cloud system that integrates multiple clouds together to provide a unified service in a collaborative manner has been introduced. However, task scheduling in such heterogeneous multi-cloud environment is very challenging. To reduce response delay caused by cross-data centers file access, we proposed a replica-aware task scheduling algorithm based on data replication. For speeding up data access in multi-cloud cooperative caches, we presented a load balanced cache placement algorithm based on Bayesian networks. In our scheduling algorithm, combined transferring computation with transferring data, resource matching is accomplished according to node locality. Only non-local unassigned and failed map tasks’ input data are replicated and transferred in advance to target nodes to expedite task execution. In our cache placement method, based on Bayesian networks the next execute task is predicted. In accordance with caching profit and recycling cost, cache prefetching files are selected. For each prefetching file, according to load balancing, target placement node is determined. Extensive experimental results show that the performance of our proposed replica-aware task scheduling algorithm is better than benchmark scheduling algorithms in terms of node locality ratio and job response time, and our load balanced cache placement algorithm outperforms the baseline caching algorithms in performance of prefetching hit ratio and execution time saving ratio.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call