Abstract

Driven by the technical factors such as system reliability, bandwidth constraints, data confidentiality and security, as well as the economic factors such as initial capital expenditure and re-occurring operating expenditure, today's cloud computing tends to adopt hybrid cloud model. However, because hybrid clouds scale both numerically and geographically, the network delay becomes the main constraint in remote file system access. To hide network latency and reduce job completion time in Hadoop-based hybrid cloud data access, a scheduling-aware data prefetching scheme to enhance non-local map task's data locality in Hadoop-based centralized hybrid cloud (CHCDLOS-Prefetch) and a file synchronizing method to decrease job execution delay in Hadoop-based distributed hybrid cloud (DHCDLO-Sync) are proposed. In the former, input data for non-local map tasks are fetched ahead of time to target compute nodes by making use of idle network bandwidth. In the latter, considered from job level scheduling, data files with high popularity are proactively synchronized beforehand among sub-clouds to strength intra sub-cloud data locality in distributed hybrid cloud. Extensive experimental results illustrate that compared to the Capacity, the Fair and the DARE algorithms, our proposed algorithms improve hybrid cloud performance more significantly in data locality and job completion time.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call