Reviewing GPU architectures to build efficient back projection for parallel geometries

Suren Chilingaryan,Alessandro Mirone,Anreas Kopmann,Evelina Ametova

doi:10.1007/s11554-019-00883-w

Abstract

Back-Projection is the major algorithm in Computed Tomography to reconstruct images from a set of recorded projections. It is used for both fast analytical methods and high-quality iterative techniques. X-ray imaging facilities rely on Back-Projection to reconstruct internal structures in material samples and living organisms with high spatial and temporal resolution. Fast image reconstruction is also essential to track and control processes under study in real-time. In this article, we present efficient implementations of the Back-Projection algorithm for parallel hardware. We survey a range of parallel architectures presented by the major hardware vendors during the last 10 years. Similarities and differences between these architectures are analyzed and we highlight how specific features can be used to enhance the reconstruction performance. In particular, we build a performance model to find hardware hotspots and propose several optimizations to balance the load between texture engine, computational and special function units, as well as different types of memory maximizing the utilization of all GPU subsystems in parallel. We further show that targeting architecture-specific features allows one to boost the performance 2–7 times compared to the current state-of-the-art algorithms used in standard reconstructions codes. The suggested load-balancing approach is not limited to the back-projection but can be used as a general optimization strategy for implementing parallel algorithms.

Highlights

X-ray tomography is a powerful tool to investigate materials and small animals at the micro- and nano-scale [1]
Our results show that all NVIDIA GPUs starting with Fermi benefit from the 64-bit texture fetches if requests are properly localized
The type-conversions are executed at a half rate of the peak floating-point performance on AMD GCN GPUs, but only a single type-conversion instruction can be executed per 12 floating-point operations on NVIDIA Kepler GPUs

Summary

Introduction

X-ray tomography is a powerful tool to investigate materials and small animals at the micro- and nano-scale [1]. A recent study suggests to implement back projection as convolution in log-polar coordinates in order to gain high reconstruction speed with interpolation in the image domain [23] This new method has not yet been adopted in production environments. Multiple papers perform a general analysis of a range of GPU architectures, reveal undisclosed details trough micro-benchmarking, and propose guidelines for performance optimization [27,28,29] This information is invaluable to understand factors limiting performance on a specific architecture and to find an alternative approach to achieve a better performance. In [31], we presented two highly-optimized back-projection algorithms for NVIDIA Pascal GPUs and a hybrid approach to balance the load between different GPU subsystems using both in parallel.

Hardware platform

Benchmarking strategy

Quality evaluation

Pseudo‐code conventions

Parallel architectures

Hardware architecture

Execution model

Memory hierarchy

Texture engine

Task partitioning

Code generation

Scheduling

Synchronization

Communication

3.10 Summary

Tomographic reconstruction

Back‐projection based on texture engine

Standard version

Multi‐slice reconstruction

Using half‐precision data representation

Efficiency of the standard algorithm

Optimizing locality of texture fetches

Optimizing memory bandwidth

Optimizing occupancy

Summary

Alternative algorithm based on ALUs

The concept

Base implementation

Optimizing the thread mapping to avoid shared memory bank conflicts

AMD and Fermi 32

Modeling

Rounding using floating‐point arithmetic

Method

Half‐float cache

Additional caches

Managing occupancy

6.10 CPU and Xeon Phi

Hybrid approaches

Combined approach for Pascal architecture

Oversampling

Findings

Conclusion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Real-Time Image Processing	Publication Date: Jun 26, 2019
Citations: 5	License type: open-access

R Discovery Prime

R Discovery Prime

Reviewing GPU architectures to build efficient back projection for parallel geometries

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Real-Time Image Processing

Lead the way for us

Similar Papers

Balancing Load of GPU Subsystems to Accelerate Image Reconstruction in Parallel Beam Tomography
Suren Chilingaryan ... Alessandro Mirone
-
Suren Chilingaryan, et. al.Suren Chilingaryan ... Alessandro Mirone
01 Sep 2018
01 Sep 2018

Active Thermography to Investigate Small-Scale Air-Water Transport Processes in the Laboratory and the Field

-

01 Jan 2014
01 Jan 2014

Enhancing temporal resolution of satellite imagery for public health studies: A case study of West Nile Virus outbreak in Los Angeles in 2007
Hua Liu ... Qihao Weng
Remote sensing of environment | VOL. 117
Hua Liu, et. al.Hua Liu ... Qihao Weng
01 Sep 2011
Remote sensing of environment | VOL. 117

Pseudo-continuous arterial spin labeling technique for measuring CBF dynamics with high temporal resolution.
Afonso C. Silva ... Seong-Gi Kim
Magnetic Resonance in Medicine | VOL. 42
Afonso C. Silva, et. al.Afonso C. Silva ... Seong-Gi Kim
01 Sep 1999
Magnetic Resonance in Medicine | VOL. 42

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Reviewing GPU architectures to build efficient back projection for parallel geometries

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Real-Time Image Processing