Accurately modeling real-world systems requires scientific applications at exascale to generate massive amounts of data and manage data storage efficiently. However, parallel input and output (I/O) faces challenges due to new application workflows and the state-of-the-art memory, interconnect, and storage architectures considered in exascale designs. The storage hierarchy has expanded with node-local persistent memory, solid-state storage, and traditional disk and tape-based storage, thus requiring efficiency at each layer and much more efficient data movement among these layers. This paper discusses how the ExaHDF5 project improved the I/O performance and data management for exascale architectures by enhancing HDF5, a widely used parallel I/O library. The team developed an Asynchronous I/O Virtual Object Layer (VOL) connector that allowed overlapping I/O with computation. They also created a Cache VOL to complement asynchronous I/O by incorporating fast storage layers, such as burst buffer and node-local storage, into the parallel I/O workflow through caching and staging data. Additionally, the team enabled data aggregation and I/O at the node level by using a Subfiling Virtual File Driver (VFD). To demonstrate superior I/O performance with HDF5 at exascale, the ExaHDF5 team collaborated with several exascale applications. In this paper, we show I/O performance improvements for three applications: Cabana (a particle-based simulation library), EQSIM (a regional earthquake simulation software), and E3SM (a climate system modeling library).
Read full abstract