Abstract

HPC applications pose high demands on I/O performance and storage capability. The emerging non-volatile memory (NVM) techniques offer low-latency, high bandwidth, and persistence for HPC applications. However, the existing I/O stack are designed and optimized based on an assumption of disk-based storage. To effectively use NVM, we must re-examine the existing high performance computing (HPC) I/O sub-system to properly integrate NVM into it. Using NVM as a fast storage, the previous assumption on the inferior performance of storage (e.g., hard drive) is not valid any more. The performance problem caused by slow storage may be mitigated; the existing mechanisms to narrow the performance gap between storage and CPU may be unnecessary and result in large overhead. Thus fully understanding the impact of introducing NVM into the HPC software stack demands a thorough performance study. In this paper, we analyze and model the performance of I/O intensive HPC applications with NVM as a block device. We study the performance from three perspectives: (1) the impact of NVM on the performance of traditional page cache; (2) a performance comparison between MPI individual I/O and POSIX I/O; and (3) the impact of NVM on the performance of collective I/O. We reveal the diminishing effects of page cache, minor performance difference between MPI individual I/O and POSIX I/O, and performance disadvantage of collective I/O on NVM due to unnecessary data shuffling. We also model the performance of MPI collective I/O and study the complex interaction between data shuffling, storage performance, and I/O access patterns.

Highlights

  • Modern HPC applications are often characterized with huge data sizes and intensive data processing

  • In a multi-node environment, MPI individual I/O has ignorable performance overhead, even if we use Non-volatile memory (NVM). This seem to indicate that the current implementation of MPI individual I/O is good for the future HPC equipped with the emerging NVM

  • The design of MPI collective I/O is based on a fundamental assumption that the I/O block device is slow and pattern sensitive, such that the data shu✏ing cost can be overweighted by the performance benefit of MPI collective I/O

Read more

Summary

Introduction

Modern HPC applications are often characterized with huge data sizes and intensive data processing. The Blue Brain project aims to simulate the human brain with a daunting 100PB memory that needs to be revisited by the solver at every time step; the cosmology simulation to study Q continuum works on 2PM per simulation Both of these simulations require transformation of the data representation, which pose high demands on I/O performance and storage capability. The emerging non-volatile memory (NVM [18]) techniques, such as Phase Change Memory (PCM) [16] and STT-RAM [11], o↵er low-latency access, high bandwidth, and persistence. Their performance is much better than the traditional hard drive, and close to or even match that of DRAM.

CHAPTER 1. INTRODUCTION
NVM Usage Model
CHAPTER 2. BACKGROUND
Benchmarks
PMBD Emulator
Impacts of Page Cache
CHAPTER 3. P9ERFORMAN1C68E0 STUDY 1890
Conclusions
CHAPTER 3. PE9RFORMANC4E43STUDY 1130
Discussion
Findings
Related Work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call