Nonuniform Fast Fourier Transform on Tpus

Tianjian Lu,Yue Zhuo,Yi-Fan Chen,Thibault Marin,Chao Ma

doi:10.1109/isbi48211.2021.9434068

Abstract

This work presents a parallel algorithm for implementing the nonuniform Fast Fourier transform (NUFFT) on Google's Tensor Processing Units (TPUs). TPU is a hardware accelerator originally designed for deep learning applications. NUFFT is considered as the main computation bottleneck in magnetic resonance (MR) image reconstruction when k-space data are sampled on a nonuniform grid. The computation of NUFFT consists of three operations: an apodization, an FFT, and an interpolation, all being formulated as tensor operations in order to fully utilize TPU's strength in matrix multiplications. The implementation is with TensorFlow. Numerical examples show 20x ~ 80x acceleration of NUFFT on a single-card TPU compared to CPU implementations. The strong scaling analysis shows a close-to-linear scaling of NUFFT on up to 64 TPU cores. The proposed implementation of NUFFT on TPUs is promising in accelerating MR image reconstruction and achieving practical runtime for clinical applications.

Full Text