Abstract

High energy physics (HEP) is moving towards extremely high statistical experiments and super-large-scale simulation of theory. In order to handle the challenge of rapid growth of data volumes, distributed computing and storage frameworks in Big Data area like Hadoop and Spark make computations easy to scale out. While the programming model based on in-memory RDD assumes that workload performs only local computation and rare message exchange, it’s inefficient at some HEP use cases, because several scientific computations, such as partial wave analysis (PWA) and lattice quantum chromodynamics (LQCD), are based on numerical linear algebra and iterative algorithms that rely on message passing between tasks. In this paper, we present a computing system (Blaze) that modifies Spark to support OpenMPI, and performs as a unified system that integrates MPI in DAG and provides task scheduling policy. The results indicate that the insufficient expressiveness in Spark model are supplemented by inter-task message passing. Additionally, Blaze also empowers MPI with the ability of data-locality computing and provides a solution of fault tolerance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.