Autonomous Cloud Federation for High-Throughput Queries over Voluminous Datasets

Matthew Malensek,Sangmi Pallickara,Shrideep Pallickara

doi:10.1109/mcc.2016.65

Abstract

The breadth and depth of information being generated and stored continues to grow rapidly, causing an information explosion. Observational devices and remote sensing equipment are no exception here, giving researchers new avenues for detecting and predicting phenomena at a global scale. To cope with increasing storage loads, hybrid clouds offer an elastic solution that also satisfies processing and budgetary needs. In this article, the authors describe their algorithms and system design for dealing with voluminous datasets in a hybrid cloud setting. Their distributed storage framework autonomously tunes in-memory data structures and query parameters to ensure efficient retrievals and minimize resource consumption. To circumvent processing hotspots, they predict changes in incoming traffic and federate their query resolution structures to the public cloud for processing. They demonstrate their framework's efficacy on a real-world, petabyte dataset consisting of more than 20 billion files.

Full Text