A survey on data storage and placement methodologies for Cloud-Big Data ecosystem

Somnath Mazumdar,Yiannis Verginadis,Daniel Seybold,Kyriakos Kritikos

doi:10.1186/s40537-019-0178-3

Abstract

Currently, the data to be explored and exploited by computing systems increases at an exponential rate. The massive amount of data or so-called “Big Data” put pressure on existing technologies for providing scalable, fast and efficient support. Recent applications and the current user support from multi-domain computing, assisted in migrating from data-centric to knowledge-centric computing. However, it remains a challenge to optimally store and place or migrate such huge data sets across data centers (DCs). In particular, due to the frequent change of application and DC behaviour (i.e., resources or latencies), data access or usage patterns need to be analyzed as well. Primarily, the main objective is to find a better data storage location that improves the overall data placement cost as well as the application performance (such as throughput). In this survey paper, we are providing a state of the art overview of Cloud-centric Big Data placement together with the data storage methodologies. It is an attempt to highlight the actual correlation between these two in terms of better supporting Big Data management. Our focus is on management aspects which are seen under the prism of non-functional properties. In the end, the readers can appreciate the deep analysis of respective technologies related to the management of Big Data and be guided towards their selection in the context of satisfying their non-functional application requirements. Furthermore, challenges are supplied highlighting the current gaps in Big Data management marking down the way it needs to evolve in the near future.

Highlights

Over the time, the type of applications has evolved from batch, compute or memory intensive applications to streaming or even interactive applications
The twofold advantage of identifying the efficient ways to manage and store Big Data are: (i) practitioners can select the most suitable Big Data management solutions for satisfying both their functional and non-functional needs; (ii) researchers can fully comprehend the research area and identify the most interesting directions to follow. We are countering both the data placement and the storage issues focusing on the Big Data management lifecycle and Cloud computing under the prism of non-functional aspects
systematic literature review (SLR) protocol formation It is a composite step related to the identification of (i) sources—here we have primarily consulted the Web of Science and Scopus, and (ii) the actual terms for querying these sources—here, we focus on population, intervention and outcome as mentioned in [34]

Summary

Introduction

The type of applications has evolved from batch, compute or memory intensive applications to streaming or even interactive applications. Applications are getting more complex and become long-running. Such applications might require frequent-access to multiple distributed data sources. Big Data is characterised by five properties [1, 2]. A large set of different data types generated from various sources can hold enormous information (in the form of relationships [3], system access logs, and as the quality of services (QoSs)). Such knowledge can be critical for improving both products and services. To retrieve the underlying knowledge from such big sized data sets an efficient data processing ecosystem and knowledge filtering methodologies are needed

Objectives

Methods

Conclusion