Approaches of enhancing interoperations among high performance computing and big data analytics via augmentation

Ajeet Ram Pathak,Siddharth S Rautaray,Manjusha Pandey

doi:10.1007/s10586-019-02960-y

Abstract

The dawn of exascale computing and its convergence with big data analytics has greatly spurred research interests. The reasons are straightforward. Traditionally, high performance computing (HPC) systems have been used for scientific applications involving majority of compute-intensive tasks. At the same time, the proliferation of big data resulted into design of data-intensive processing paradigms like Apache big data stack. Big data generating at high pace necessitates faster processing mechanisms for getting insights at a real time. For this, the HPC systems may serve as panacea for solving the big data problems. Though the HPC systems have the capability to give the promising results for big data, directly integrating them with existing data-intensive frameworks like Apache big data stack is not straightforward due to challenges associated with them. This triggers a research on seamlessly integrating these two paradigms based on interoperable framework, programming model, and system architecture. The aim of this paper is to assess a progress made in HPC world as an effort to augment it with big data analytics support. As an outcome of this, the taxonomy showing the factors to be considered for augmenting HPC systems with big data support has been put forth. This paper sheds light upon how big data frameworks can be ported to HPC platforms as a preliminary step towards the convergence of big data and exascale computing ecosystem. The focus is given on research issues related to augmenting HPC paradigms with big data frameworks and corresponding approaches to address those issues. This paper also discusses data-intensive as well as compute-intensive processing paradigms, benchmark suites and workloads, and future directions in the domain of integrating HPC with big data analytics.

Full Text