Abstract

Over the last two decades, scientific discovery has increasingly been driven by the large availability of data from a multitude of sources, including high-resolution simulations, observations and instruments, as well as an enormous network of sensors and edge components. In such a dynamic and growing landscape where data continue to expand, advances in Science have become intertwined with the capacity of analysis tools to effectively handle and extract valuable information from this ocean of data. In view of the exascale era of supercomputers that is rapidly approaching, it is of the utmost importance to design novel solutions that can take full advantage of the upcoming computing infrastructures. The convergence of High Performance Computing (HPC) and data-intensive analytics is key to delivering scalable High Performance Data Analytics (HPDA) solutions for scientific and engineering applications. The aim of this paper is threefold: reviewing some of the most relevant challenges towards HPDA at scale, presenting a HPDA-enabled version of the Ophidia framework and validating the scalability of the proposed framework through an experimental performance evaluation carried out in the context of the Centre of Excellence in Simulation of Weather and Climate in Europe (ESiWACE). The experimental results show that the proposed solution is capable of scaling over several thousand cores and hundreds of cluster nodes. The proposed work is a contribution in support of scientific large-scale applications along the wider convergence path of HPC and Big Data followed by the scientific research community.

Highlights

  • Over the last two decades, scientific discovery has increasingly been driven by the large availability of data from a multitude of sources, including high-resolution simulations, observations and instruments, as well as an enormous network of sensors and edge components [1].The associate editor coordinating the review of this manuscript and approving it for publication was Dongxiao Yu .Thanks to the data deluge that started at the beginning of this century, data-intensive science has emerged as the fourth scientific paradigm [2], [3] paving the way towards the Big Data revolution, which broke up around 2010 and led to a new awareness of the multifaceted complexity and relevance of data

  • EXPERIMENTAL EVALUATION AND RESULTS To analyze the scalability and performance of the proposed High Performance Data Analytics (HPDA) framework, an experimental evaluation has been conducted over a large-scale High Performance Computing (HPC) cluster

  • The proposed tests focus on the evaluation of the strong and weak scalability of the provided HPDA runtime system with respect to a set of real-world analytics operations

Read more

Summary

Introduction

Thanks to the data deluge that started at the beginning of this century, data-intensive science has emerged as the fourth scientific paradigm [2], [3] paving the way towards the Big Data revolution, which broke up around 2010 and led to a new awareness of the multifaceted complexity and relevance of data As part of this process, the term Big Data which originally referred to just a few orthogonal dimensions such as volume, velocity and variety [4], i.e. the most obvious and quantitative aspects of data, was later on further complemented and enriched with new dimensions. The Big Data revolution led through the years to the birth of an incredibly vast software ecosystem able to foster a data-centric paradigm for scientific discovery, while complementing and enriching the well-established simulation-centric paradigm that is mainly adopted by the HPC community [7]

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call