Abstract

Today, lightweight virtualization technologies have been widely deployed on data centers and HPC clusters to provide highly efficient and elastic resource provisioning. Virtualization has also been extended to the I/O stack in operating system. For example, virtual switch has become the primary provider of I/O services for data movement among various light-weight virtual machines, such as Docker and Kubernetes. However, I/O stack virtualization introduces performance degradation and scalability bottleneck to the data movements of HPC computing framework, such as MPI based collective data movements and bursty asynchronous data movements. In order to study the bottleneck, we quantify and analyze the performance degradation involving with HPC data movements on virtual clusters. Then, we design a set of two-stage methods to proactively adapt the virtual network and data movement procedures. This can enhance the performance of HPC collective data movements by up to 3$$\times $$. Meanwhile, a cross-layer middleware is designed to improve the performance and scalability of bursty asynchronous data movements. Our evaluation shows that it can improve the performance of real scientific application by 34.6%.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.