Abstract

Big Data (BD) platforms have a long tradition of leveraging trends and technologies from the broader computer network and communication community. For several years, dedicated servers of homogeneous clusters were employed as the dominant paradigm in BD networks. In recent years, the BD landscape has changed, porting different deployment architectures with various network models. This trend has resulted in various associated opportunities and challenges that induce BD practitioners to achieve the next-generation BD vision. In particular, addressing the BD velocity with batch and micro-batch processing. Nevertheless, the literature misses an extensive study of the associated impacts of adopting these new deployment architectures, giving it holds a significant research interest. This study addresses the previous concern, offering a comprehensive review of the architectural elements of BD batch query deployment models and environments. A novel taxonomy is proposed to classify these models based on their underlying communication systems. We first discuss the batch query processing requirements as comparison criteria of BD communication models and compare their salient features. The benefits/challenges of these environments away from BD traditional on-premise dedicated clusters are presented. Thereafter, we provide an extensive survey of the modern BD deployment architectures, categorizing them based on their underlying infrastructure. Finally, several directions are outlined for future research on improving the state-of-the-art of BD landscape and provide recommendations for the BD practitioners on emerging environments supporting BD applications and the general large-scale data analytics.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call