TomusBlobs: scalable data‐intensive processing on Azure clouds

Alexandru Costan,Goetz Brasche,Gabriel Antoniu,Radu Tudoran

doi:10.1002/cpe.3034

Abstract

SummaryThe emergence of cloud computing has brought the opportunity to use large‐scale compute infrastructures for a broader and broader spectrum of applications and users. As the cloud paradigm gets attractive for the ‘elasticity’ in resource usage and associated costs (the users only pay for resources actually used), cloud applications still suffer from the high latencies and low performance of cloud storage services. As Big Data analysis on clouds becomes more and more relevant in many application areas, enabling high‐throughput massive data processing on cloud data becomes a critical issue, as it impacts the overall application performance. In this paper, we address this challenge at the level of cloud storage. We introduce a concurrency‐optimized data storage system (called TomusBlobs), which federates the virtual disks associated to the Virtual Machines running the application code on the cloud. We demonstrate the performance benefits of our solution for efficient data‐intensive processing by building an optimized prototype MapReduce framework for Microsoft's Azure cloud platform on the basis of TomusBlobs. Finally, we specifically address the limitations of state‐of‐the‐art MapReduce frameworks for reduce‐intensive workloads, by proposing MapIterativeReduce as an extension of the MapReduce model. We validate the aforementioned contributions through large‐scale experiments with synthetic benchmarks and with real‐world applications on the Azure commercial cloud by using resources distributed across multiple data centers; they demonstrate that our solutions bring substantial benefits to data‐intensive applications compared with approaches relying on state‐of‐the‐art cloud object storage. Copyright © 2013 John Wiley & Sons, Ltd.

Full Text