Abstract

Drawing parallels to the rise of general purpose graphical processing units (GPGPUs) as accelerators for specific high-performance computing (HPC) workloads, there is a rise in the use of non-volatile memory (NVM) as accelerators for I/O-intensive scientific applications. However, existing works have explored use of NVM within dedicated I/O nodes, which are distant from the compute nodes that actually need such acceleration. As NVM bandwidth begins to out-pace point-to-point network capacity, we argue for the need to break from the archetype of completely separated storage. Therefore, in this work we investigate co-location of NVM and compute by varying I/O interfaces, file systems, types of NVM, and both current and future SSD architectures, uncovering numerous bottlenecks implicit in these various levels in the I/O stack. We present novel hardware and software solutions, including the new Unified File System (UFS), to enable fuller utilization of the new compute-local NVM storage. Our experimental evaluation, which employs a real-world Out-of-Core (OoC) HPC application, demonstrates throughput increases in excess of an order of magnitude over current approaches.

Highlights

  • Purpose-built computing for acceleration of scientific applications is gaining traction in clusters small and large across the globe, with general-purpose graphic processing units (GPGPUs) leading the charge

  • We begin by summarizing the science performed by the OoC application we use for evaluation, and detail the high performance computing (HPC) architecture such computation is performed upon

  • While this use-case is quite specialized, as we demonstrate in Section 4, it delivers such high performance relative to even tuned traditional file systems, sometimes in excess of 100%, that the extra work is repaid with considerable benefits in execution speed

Read more

Summary

Introduction

Purpose-built computing for acceleration of scientific applications is gaining traction in clusters small and large across the globe, with general-purpose graphic processing units (GPGPUs) leading the charge. This finding is compelling, as the properties of modern SSDs firmly occupy the previously sprawling no man’s land between main memory and disk latency, which without SSDs spans three orders of magnitude By employing these SSDs alongside traditional magnetic storage on the I/O nodes (IONs) in the cluster as shown, these works demonstrate that only fractions of the large dataset need be kept in compute node memory at any one time; new chunks of the dataset can be brought in over the network from the SSDs on the ION on an as-needed basis and without much delay to the algorithm. To these overheads, and last experimentally demonstrate near-optimal performance of the compute-local NVM using a cycle-accurate NVM simulation framework, achieving a relative improvement of 10.3 times over traditional ION-local NVM solutions

Background
Out-of-core scientific computing
HPC architecture
Non-volatile memory
Holistic system analysis
Architecture and software framework
File systems
Device protocols and interfaces
Evaluation
Experimental configuration
Architecture and file system results
Results of device improvement
Digging deeper
Related work
Conclusion and future work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call