Graphics Processing Units Architecture Research Articles

Efficient computation of the geopotential gradient is essential for numerical propagators, particularly in scenarios involving low Earth orbits. Conventional geopotential calculations are based on spherical harmonics series, which become computationally demanding as the degree/order increases. This computational burden can be mitigated by means of parallelized algorithms. Additionally, certain situations lend themselves to high parallelization, such as the propagation of space debris catalogs, satellite mega-constellations, or the dispersion of particles resulting from a space collision event. This paper introduces an optimized Graphics Processing Unit (GPU) implementation designed to facilitate extensive parallelization in the geopotential gradient calculation. The formulation developed in this study is not specific to any GPU. However, to illustrate the low-level optimizations necessary for an efficient implementation, we selected the Compute Unified Device Architecture (CUDA) as the dominant and de facto standard in parallel computing. Nevertheless, most of the concepts and optimizations presented in this paper are also valid for other GPU architectures. Built upon the spherical harmonic expansion using the Cunningham formulation, which is well-suited for GPU computations, our implementation offers several variants with different tradeoffs between speed and accuracy. Besides GPU double precision, we introduced a mixed precision arithmetic –a hybrid between single and double precision– that exploits GPU capabilities with a low penalty in accuracy. The proposed algorithm was implemented as a software reusable module, and its performance was evaluated against GMAT, GODOT, and Orekit astrodynamic codes. The algorithm’s accuracy in double precision is comparable to such codes. The mixed precision version showed enough accuracy for LEO satellite propagation, with around 1 m difference in four days. Testing across different CUDA architectures revealed very high speed-up factors compared to a single CPU, reaching a speed-up of 645 for the mixed precision variant and 450 for the double precision one in the propagation of about 3200 objects with a geopotential of degree/order 126 × 126 using an A100 GPU device.

Graphics Processing Units (GPUs) are widely used for modern applications with huge data sizes. However, the performance benefit of GPUs is limited by their memory capacity and bandwidth. Although GPU vendors improve memory capacity and bandwidth using 3D memory technology (HBM), many important workloads with terabytes of data still cannot fit in the provided capacity and are bound by the provided bandwidth. With a limited GPU memory capacity, programmers should handle the data movement between GPU and host memories by themselves, causing a significant programming burden. To improve programming ease, GPUs use a unified address space with the host that allows over-subscribing GPU memory, but this approach is not effective in terms of performance once GPUs encounter memory page faults. Many recent works have tried to remedy capacity and bandwidth bottlenecks using dense non-volatile memories (NVMs) and true-3D stacking. However, these works mainly focus on one bottleneck or do not provide a scalable solution that fits future requirements. In this paper, we investigate true-3D stacking of dense, low-power, and refresh-free non-volatile phase change memory (PCM) on top of state-of-the-art GPU configurations to provide higher capacity and bandwidth within the available area and power budget. The higher density and lower power consumption of PCM provide higher capacity through integrating more cells in each 3D layer and enabling stacking more layers. However, we observe that stacking more than six layers of pure-PCM memory violates the thermal constraint and severely harms the performance and power efficiency due to its higher write latency and energy. Further, it degrades the lifetime of GPU to less than one year. Utilizing a hybrid architecture that leverages the benefits of both DRAM and PCM memories has been widely studied by prior proposals; however, true-3D integration of such a hybrid memory architecture especially on top of state-of-the-art powerful GPU architecture has not been investigated yet. We experimentally demonstrate that by covering 80% of write requests in DRAM and eliminating refresh overhead, true-3D stacking of eight 32GB layers of PCM along with two 8GB layers of DRAM is possible resulting in a total of 272GB memory capacity. Based on the explored design requirements, We propose a 3D high-bandwidth high-capacity hybrid memory (H3DM) system utilizing a hybrid-3D (H3D)-aware remapping scheme to reduce expensive PCM writes to under 20% while avoiding DRAM refresh overhead. H3DM improves the performance up to 291% compared to the baseline GPU architecture while remaining within only 3% of an ideal case with DRAM-like access latency, on average. Moreover, by increasing the dataset size above the baseline GPU memory space, H3DM improves performance and power up to 648% and 87% compared to the baseline GPU architecture since it avoids expensive data transfers through off-chip communication links.

Graphics Processing Units Architecture Research Articles

Related Topics

Articles published on Graphics Processing Units Architecture

High-speed turbulent flows towards the exascale: STREAmS-2 porting and performance

LibERI-A portable and performant multi-GPU accelerated library for electron repulsion integrals via OpenMP offloading and standard language parallelism.

A GPU-accelerated Monte Carlo code, RT2 for coupled transport of photon, electron/positron, and neutron

A novel phase-field lattice Boltzmann framework for diffusion-driven multiphase evaporation

GPU-based parallel programming for FEM analysis in the optimization of steel frames

Efficient computation of the geopotential gradient in graphic processing units

Time Predictable Modeling Method for GPU Architecture with SIMT and Cache Miss Awareness

Enhancing GPU performance and energy efficiency: Innovative strategies for sustainable computing

Agnostic Energy Consumption Models for Heterogeneous GPUs in Cloud Computing

FLEW: A DNS Solver for Compressible Flows in Generalized Curvilinear Coordinates

H3DM: A High-bandwidth High-capacity Hybrid 3D Memory Design for GPUs

Assessing the Impact of Compiler Optimizations on GPUs Reliability

BLAS Kütüphanelerinin GPU Mimarilerindeki Nicel Performans Analizi

Parallel computation to bidimensional heat equation using MPI/CUDA and FFTW package

GPU-Accelerated Signal Processing for Passive Bistatic Radar

Physical vapor deposition simulator by graphical processor unit ray casting

An Enhanced Python-Based Open-Source Particle Image Velocimetry Software for Use with Central Processing Units

Accelerating Coupled-Cluster Calculations with GPUs: An Implementation of the Density-Fitted CCSD(T) Approach for Heterogeneous Computing Architectures Using OpenMP Directives.

H-Analysis and data-parallel physics-informed neural networks

Solution of nonlinear fractional-order models of nuclear reactor with parallel computing: Implementation on GPU platform

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Graphics Processing Units Architecture Research Articles

Related Topics

Articles published on Graphics Processing Units Architecture

High-speed turbulent flows towards the exascale: STREAmS-2 porting and performance

LibERI-A portable and performant multi-GPU accelerated library for electron repulsion integrals via OpenMP offloading and standard language parallelism.

A GPU-accelerated Monte Carlo code, RT2 for coupled transport of photon, electron/positron, and neutron

A novel phase-field lattice Boltzmann framework for diffusion-driven multiphase evaporation

GPU-based parallel programming for FEM analysis in the optimization of steel frames

Efficient computation of the geopotential gradient in graphic processing units

Time Predictable Modeling Method for GPU Architecture with SIMT and Cache Miss Awareness

Enhancing GPU performance and energy efficiency: Innovative strategies for sustainable computing

Agnostic Energy Consumption Models for Heterogeneous GPUs in Cloud Computing

FLEW: A DNS Solver for Compressible Flows in Generalized Curvilinear Coordinates

H3DM: A High-bandwidth High-capacity Hybrid 3D Memory Design for GPUs

Assessing the Impact of Compiler Optimizations on GPUs Reliability

BLAS Kütüphanelerinin GPU Mimarilerindeki Nicel Performans Analizi

Parallel computation to bidimensional heat equation using MPI/CUDA and FFTW package

GPU-Accelerated Signal Processing for Passive Bistatic Radar

Physical vapor deposition simulator by graphical processor unit ray casting

An Enhanced Python-Based Open-Source Particle Image Velocimetry Software for Use with Central Processing Units

Accelerating Coupled-Cluster Calculations with GPUs: An Implementation of the Density-Fitted CCSD(T) Approach for Heterogeneous Computing Architectures Using OpenMP Directives.

H-Analysis and data-parallel physics-informed neural networks

Solution of nonlinear fractional-order models of nuclear reactor with parallel computing: Implementation on GPU platform