Abstract

Data-intensive knowledge discovery requires scientific applications to run concurrently with analytics and visualization codes executing in situ for timely output inspection and knowledge extraction. Consequently, I/O pipelines of scientific workflows can be long and complex because they comprise many stages of analytics across different layers of the I/O stack of high-performance computing systems. Performance limitations at any I/O layer or stage can cause an I/O bottleneck resulting in greater than expected end-to-end I/O latency. In this paper, we present the design and implementation of a novel data management infrastructure called Software-Defined Storage Resource Enclaves (SIREN) at system level to enforce end-to-end policies that dictate an I/O pipeline's performance. SIREN provides an I/O performance interface for users to specify the desired storage resources in the context of in-situ analytics. If suboptimal performance of analytics is caused by an I/O bottleneck when data are transferred between simulations and analytics, schedulers in different layers of the I/O stack automatically provide the guaranteed lower bounds on I/O throughput. Our experimental results demonstrate that SIREN provides performance isolation among scientific workflows sharing multiple storage servers across two I/O layers (burst buffer and parallel file systems) while maintaining high system scalability and resource utilization.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call