Abstract

With the convergence of high performance computing (HPC) and big data, processing a large volume of scientific data on HPC systems is getting increased attentions. However, supporting these data-intensive workloads on HPC systems that are geared for compute-intensive workloads presents a new challenge in data management. Typical HPC systems consist of a large collection of compute nodes and use parallel file systems (PFSs) for persistent data storage. Although PFSs provide concurrent I/O bandwidth and perform well for large sequential write/read requests, its performance is bottlenecked by expensive metadata operations. Moreover, data-intensive applications with bursty I/O patterns or generate a large number of temporary files exacerbate the shortcomings. In this paper, we propose Pream, a light-weight metadata management framework that aim to address these challenges. Pream targets scenarios of supporting data-intensive workloads that generates a huge number of temporary files on diskless compute nodes. Pream pre-allocates file metadata from the metadata server, and manages these metadata locally to accelerate metadata operations. While newly created temporary files keep residing in PFSs, open/create requests of these temporary files can be handled by Pream locally without connecting with PFSs. Our evaluation demonstrates that Pream can outperform Lustre in many workloads and reduce latency of metadata operation efficiently.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call