Abstract

It is increasingly important for the next-generation exascale supercomputers to extend its applications beyond traditional HPC (high-performance computing) scenarios, so as to achieve high social and economic benefit. Similar to AWS (Amazon Web Services) and Alibaba Cloud, cloud-style virtual HPC service is a promising application scenario on supercomputers, for which remote block storage is the key to provide tenants with supercomputers extremely-high storage performance. Unfortunately, the state-of-the-art block storage software systems (such as URSA and Ceph) cannot adapt to the advanced hardware features of supercomputers. This paper presents UrsaX, an efficient block storage service for our next-generation Tianhe exascale supercomputer that is equipped with the high-performance GLEX network and NVMe (Non-Volatile Memory Express) SSDs. UrsaXs virtual disks, which can be mounted like normal physical ones, enable not only traditional HPC applications but also supercomputeroblivious POSIX applications to enjoy the high performance of supercomputers. At the core of UrsaX is with a novel design of the efficient integration of on-disk block I/O and in-network message transfer on supercomputers. UrsaX utilizes the NVMe Fabrics kernel module to expand the NVMe standard on the supercomputer network, and separates metadata I/O and data I/O of blocks respectively being handled over the MP (Mini Packet) and RDMA (Remote Direct Memory Access) protocols. We thoroughly explore the design space for remote block storage on supercomputers including parallelism, scalability, fault tolerance, and consistency. We conduct extensive evaluation on a subset of our exascale supercomputer consisting of 44 storage machines (each with four NVMe SSDs). The result shows that UrsaX achieves local-storage-level I/O latency (tens of microseconds) while being able to linearly increase the aggregate performance (IOPS and throughput) as the system scale increases, an order of magnitude higher than the state-of-the-art block storage systems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call