Abstract

Purpose To develop generic optimization strategies for image reconstruction using graphical processing units (GPUs) in magnetic resonance imaging (MRI) and to exemplarily report on our experience with a highly accelerated implementation of the nonlinear inversion (NLINV) algorithm for dynamic MRI with high frame rates. Methods The NLINV algorithm is optimized and ported to run on a multi-GPU single-node server. The algorithm is mapped to multiple GPUs by decomposing the data domain along the channel dimension. Furthermore, the algorithm is decomposed along the temporal domain by relaxing a temporal regularization constraint, allowing the algorithm to work on multiple frames in parallel. Finally, an autotuning method is presented that is capable of combining different decomposition variants to achieve optimal algorithm performance in different imaging scenarios. Results The algorithm is successfully ported to a multi-GPU system and allows online image reconstruction with high frame rates. Real-time reconstruction with low latency and frame rates up to 30 frames per second is demonstrated. Conclusion Novel parallel decomposition methods are presented which are applicable to many iterative algorithms for dynamic MRI. Using these methods to parallelize the NLINV algorithm on multiple GPUs, it is possible to achieve online image reconstruction with high frame rates.

Highlights

  • Accelerators such as graphical processing units (GPUs) or other multicore vector coprocessors are well-suited for achieving fast algorithm run times when applied to reconstruction problems in medical imaging [1] including computed tomography [2], positron emission tomography [3], and ultrasound [4]

  • We summarize our experience in developing a low-latency online reconstruction system for real-time magnetic resonance imaging (MRI) over the last eight years

  • We focus on the specific application of real-time MRI, many of the techniques developed for this project can be applied to similar tomographic reconstruction problems

Read more

Summary

Introduction

Accelerators such as graphical processing units (GPUs) or other multicore vector coprocessors are well-suited for achieving fast algorithm run times when applied to reconstruction problems in medical imaging [1] including computed tomography [2], positron emission tomography [3], and ultrasound [4]. This is because respective algorithms usually apply a massive number of the same independent operations on pixels, voxels, bins, or sampling points. We describe strategies for optimal choice of the grid size for a convolutionbased nonuniform FFT, novel parallelization schemes using temporal and spatial decomposition, and automatic tuning of parameters

Theory
Optimization Methods and Results
GPUs fps
Discussion
Conclusion
Video S2
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call