LiU: Hiding Disk Access Latency for HPC Applications with a New SSD-Enabled Data Layout

Dachuan Huang,Song Jiang,Xuechen Zhang,Feng Qin,Mai Zheng,Wei Shi

doi:10.1109/mascots.2013.19

Abstract

Unlike in the consumer electronics and personal computing areas, in the HPC environment hard disks can hardly be replaced by SSDs. The reasons include hard disk's large capacity, very low price, and decent peak throughput. However, when latency dominates the I/O performance (e.g., when accessing random data), the hard disk's performance can be compromised. If the issue of high latency could be effectively solved, the HPC community would enjoy a large, affordable and fast storage without having to replace disks completely with expensive SSDs. In this paper, we propose an almost latency-free hard-disk dominated storage system called LiU for HPC. The key technique is leveraging limited amount of SSD storage for its low-latency access, and changing data layout in a hybrid storage hierarchy with low-latency SSD at the top and high-latency hard disk at the bottom. If a segment of data would be randomly accessed, we lift its top part (the head) up in the hierarchy to the SSD and leave the remaining part (the body) untouched on the disk. As a result, the latency of accessing this whole segment can be removed because access latency of the body can be hidden by the access time of the head on the SSD. Combined with the effect of prefetching a large segment, LiU (Lift it Up) can effectively remove disk access latency so disk's high peak throughput can now be fully exploited for data-intensive HPC applications. We have implemented a prototype of LiU in the PVFS parallel file system and evaluated it with representative MPI-IO micro benchmarks, including mpi-io-test, mpi-tile-io, and ior-mpi-io, and one macro-benchmark BTIO. Our experimental results show that LiU can effectively improve the I/O performance for HPC applications, with the throughput improvement ratio up to 5.8. Furthermore, LiU can bring much more benefits to sequential-I/O MPI applications when the applications are interfered by other workloads. For example, LiU improves the I/O throughput of mpi-io-test, which is under interference, by 1.1-3.4 times, while improving the same workload without interference by 15%.

Full Text