Predicting and Comparing the Performance of Array Management Libraries.

Donghe Kang,Spyros Blanas,Suren Byna,Oliver Rubel

doi:10.1109/ipdps47924.2020.00097

Abstract

Many applications are increasingly becoming I/O-bound. To improve scalability, analytical models of parallel I/O performance are often consulted to determine possible I/O optimizations. However, I/O performance modeling has predominantly focused on applications that directly issue I/O requests to a parallel file system or a local storage device. These I/O models are not directly usable by applications that access data through standardized I/O libraries, such as HDF5, FITS, and NetCDF, because a single I/O request to an object can trigger a cascade of I/O operations to different storage blocks. The I/O performance characteristics of applications that rely on these libraries is a complex function of the underlying data storage model, user-configurable parameters and object-level access patterns. As a consequence, I/O optimization is predominantly an ad-hoc process that is performed by application developers, who are often domain scientists with limited desire to delve into nuances of the storage hierarchy of modern computers. This paper presents an analytical cost model to predict the end-to-end execution time of applications that perform I/O through established array management libraries. The paper focuses on the HDF5 and Zarr array libraries, as examples of I/O libraries with radically different storage models: HDF5 stores every object in one file, while Zarr creates multiple files to store different objects. We find that accessing array objects via these I/O libraries introduces new overheads and optimizations. Specifically, in addition to I/O time, it is crucial to model the cost of transforming data to a particular storage layout (memory copy cost), as well as model the benefit of accessing a software cache. We evaluate the model on real applications that process observations (neuroscience) and simulation results (plasma physics). The evaluation on three HPC clusters reveals that I/O accounts for as little as 10% of the execution time in some cases, and hence models that only focus on I/O performance cannot accurately capture the performance of applications that use standard array storage libraries. In parallel experiments, our model correctly predicts the fastest storage library between HDF5 and Zarr 94% of the time, in contrast with 70% of the time for a cutting-edge I/O model.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Predicting and Comparing the Performance of Array Management Libraries.

Abstract

Talk to us

Similar Papers

More From: Proceedings. IPDPS (Conference)

Lead the way for us

Journal: Proceedings. IPDPS (Conference)	Publication Date: May 1, 2020
Citations: 18

Similar Papers

Energy-efficient embedded software implementation on multiprocessor system-on-chip with multiple voltages
Shaoxiong Hua ... Gang Qu
ACM Transactions on Embedded Computing Systems | VOL. 5
Shaoxiong Hua, et. al.Shaoxiong Hua ... Gang Qu
01 May 2006
ACM Transactions on Embedded Computing Systems | VOL. 5

Quick Execution Time Predictions for Spark Applications
Sarah Shah ... Diwakar Krishnamurthy
-
Sarah Shah, et. al.Sarah Shah ... Diwakar Krishnamurthy
01 Oct 2019
01 Oct 2019

Locality-aware task scheduling for homogeneous parallel computing systems
Muhammad Khurram Bhatti ... Konstantin Popov
Computing | VOL. 100
Muhammad Khurram Bhatti, et. al.Muhammad Khurram Bhatti ... Konstantin Popov
01 Nov 2017
Computing | VOL. 100

Matching and scheduling algorithms for minimizing execution time and failure probability of applications in heterogeneous computing
A Dogan ... F Ozguner
IEEE Transactions on Parallel and Distributed Systems | VOL. 13
A Dogan, et. al.A Dogan ... F Ozguner
01 Mar 2002
IEEE Transactions on Parallel and Distributed Systems | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Predicting and Comparing the Performance of Array Management Libraries.

Abstract

Talk to us

Similar Papers

More From: Proceedings. IPDPS (Conference)