Abstract
Tensor Cores are specialized hardware units added to recent NVIDIA GPUs to speed up matrix multiplication-related tasks, such as convolutions and densely connected layers in neural networks. Due to their specific hardware implementation and programming model, Tensor Cores cannot be straightforwardly applied to other applications outside machine learning. In this paper, we demonstrate the feasibility of using NVIDIA Tensor Cores for the acceleration of a non-machine learning application: iterative Computed Tomography (CT) reconstruction. For large CT images and real-time CT scanning, the reconstruction time for many existing iterative reconstruction methods is relatively high, ranging from seconds to minutes, depending on the size of the image. Therefore, CT reconstruction is an application area that could potentially benefit from Tensor Core hardware acceleration. We first studied the reconstruction algorithm’s performance as a function of the hardware related parameters and proposed an approach to accelerate reconstruction on Tensor Cores. The results show that the proposed method provides about 5 times increase in speed and energy saving using the NVIDIA RTX 2080 Ti GPU for the parallel projection of 32 images of size 512times 512. The relative reconstruction error due to the mixed-precision computations was almost equal to the error of single-precision (32-bit) floating-point computations. We then presented an approach for real-time and memory-limited applications by exploiting the symmetry of the system (i.e., the acquisition geometry). As the proposed approach is based on the conjugate gradient method, it can be generalized to extend its application to many research and industrial fields.
Highlights
Graphics Processing Units (GPUs), as one of the most feasible parallel structured processors, have proven their power in facilitating research in a wide range of fields, including highperformance computing, data centers, medical imaging, and machine learning
We demonstrate the application of NVIDIA Tensor Cores to accelerating Computed Tomography (CT) forward-projection (FP) and back-projection (BP) algorithms, which are of the most demanding kernels in iterative reconstruction approaches
The experimental setup and the sizes considered for images and projection are explained
Summary
Graphics Processing Units (GPUs), as one of the most feasible parallel structured processors, have proven their power in facilitating research in a wide range of fields, including highperformance computing, data centers, medical imaging, and machine learning. GPU-based machine learning applications, and deep learning, have significantly grown in use in recent years [18]. To address this need, NVIDIA has introduced a specialized computing unit called Tensor Core that speeds up neural network. The distance-driven method, on the other hand, converts the problem of projection–backprojection into a one-dimensional re-sampling problem by mapping both the which is a linear equation in P0 and P1 This means that the distance-driven projection describes a linear transformation from the image domain to the projection domain.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.