Motion-compensated image reconstruction enables new clinical applications of Magnetic Resonance Imaging (MRI), but it relies on computationally intensive algorithms. This study focuses on the Generalized Reconstruction by Inversion of Coupled Systems (GRICS) program, applied to the reconstruction of 3D images in cases of non-rigid or rigid motion. It uses hybrid parallelization with the MPI (Message Passing Interface) and OpenMP (Open Multi-Processing). For clinical integration, the GRICS needs to efficiently harness the computational resources of compute nodes. We aim to improve the GRICS’s performance without any code modification. This work presents a performance study of GRICS on two CPU architectures: Intel Xeon Gold and AMD EPYC. The roofline model is used to study the software–hardware interaction and quantify the code’s performance. For CPU–GPU comparison purposes, we propose a preliminary MATLAB–GPU implementation of the GRICS’s reconstruction kernel. We establish the roofline model of the kernel on two NVIDIA GPU architectures: Quadro RTX 5000 and A100. After the performance study, we propose some optimization patterns for the code’s execution on CPUs, first considering only the OpenMP implementation using thread binding and affinity and appropriate architecture-compilation flags and then looking for the optimal combination of MPI processes and OpenMP threads in the case of the hybrid MPI–OpenMP implementation. The results show that the GRICS performed well on the AMD EPYC CPUs, with an architectural efficiency of 52%. The kernel’s execution was fast on the NVIDIA A100 GPU, but the roofline model reported low architectural efficiency and utilization.
Read full abstract