Abstract

This work investigates the feasibility of a Singularity container-based solution to support a customizable computing environment for running users' MPI applications in “hybrid” MPI mode-where the MPI on the host machine works in tandem with MPI inside the container-on NASA's High-End Computing Capability (HECC) resources. Two types of real-world applications were tested: traditional High-Performance Computing (HPC) and Artificial Intelligence/Machine Learning (AI/ML). On the traditional HPC side, two JEDI containers built with Intel MPI for Earth science modeling were tested on both HECC in-house and HECC AWS Cloud CPU resources. On the AI/ML side, a NVIDIA TensorFlow container built with OpenMPI was tested with a Neural Collaborative Filtering recommender system and the ResNet-50 computer image system on the HECC in-house V100 GPUs. For each of these applications and resource environments, multiple hurdles were overcome after lengthy debugging efforts. Among them, the most significant ones were due to the conflicts between a host MPI and a container MPI and the complexity of the communication layers underneath. Although porting containers to run with a single node using just the container MPI is quite straightforward, our exercises demonstrate that running across multiple nodes in hybrid MPI mode requires knowledge of Singularity, MPI libraries, the operating system image, and the communication infrastructure such as the transport and network layers, which are traditionally handled by support staff of HPC centers and hardware or software vendors. In conclusion, porting and running Singularity containers on HECC resources or other data centers with similar environments is feasible but most users would need help to run them in hybrid MPI mode.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.