Abstract
It is increasingly important for the next-generation exascale supercomputers to extend its applications beyond traditional HPC (high-performance computing) scenarios, so as to achieve high social and economic benefit. Similar to AWS (Amazon Web Services) and Alibaba Cloud, cloud-style virtual HPC service is a promising application scenario on supercomputers, for which remote block storage is the key to provide tenants with supercomputers extremely-high storage performance. Unfortunately, the state-of-the-art block storage software systems (such as URSA and Ceph) cannot adapt to the advanced hardware features of supercomputers. This paper presents UrsaX, an efficient block storage service for our next-generation Tianhe exascale supercomputer that is equipped with the high-performance GLEX network and NVMe (Non-Volatile Memory Express) SSDs. UrsaXs virtual disks, which can be mounted like normal physical ones, enable not only traditional HPC applications but also supercomputeroblivious POSIX applications to enjoy the high performance of supercomputers. At the core of UrsaX is with a novel design of the efficient integration of on-disk block I/O and in-network message transfer on supercomputers. UrsaX utilizes the NVMe Fabrics kernel module to expand the NVMe standard on the supercomputer network, and separates metadata I/O and data I/O of blocks respectively being handled over the MP (Mini Packet) and RDMA (Remote Direct Memory Access) protocols. We thoroughly explore the design space for remote block storage on supercomputers including parallelism, scalability, fault tolerance, and consistency. We conduct extensive evaluation on a subset of our exascale supercomputer consisting of 44 storage machines (each with four NVMe SSDs). The result shows that UrsaX achieves local-storage-level I/O latency (tens of microseconds) while being able to linearly increase the aggregate performance (IOPS and throughput) as the system scale increases, an order of magnitude higher than the state-of-the-art block storage systems.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.