CUDA Implementation Research Articles

The Particle Image Velocimetry (PIV) method is widely used for optical measurment of flow velocity fields. This paper demonstrates the possibilities of using high-level libraries for GPU-accelerated PIV data analysis in Python. The Torch PIV library for the analysis of 2D PIV experiments based on the deep learning framework PyTorch with CUDA support was developed. The library implements a multi pass cross-correlation FFT PIV algorithm with an interrogation window shift. The chosen implementation does not require compilation from the user, has a compact codebase, is able to run both on the CPU and the GPU depending on the user choice, and also it is as flexible as the Python module. In this work, the performance of the CPU version of the developed method was compared with existing open source implementations. It is shown that the main functions of the developed module can be executed on the GPU at the speed of CUDA implementations. The developed library is tested on synthetic images and experimental data. Program SummaryProgram title: TorchPIV.CPC Library link to program files: https://doi.org/10.17632/yw43vjc36h.1.Developer's repository link: https://github.com/NikNazarov/TorchPIV.Licensing provisions: MIT license.Programming language: Python.Nature of problem: PIV experiments often require analyzing a large number of images in order to determine the statistical characteristics of the flow. GPUs are actively used in this field to speed up the data analysis process. Open-source software solutions to this problem usually require build and are difficult to integrate into the analysis pipelines.Solution method: The main feature of the developed module is its flexibility and simple distribution. The module is cross-platform, and installation does not require compilation from the user. The developed library can be imported as an ordinary Python module, at the same time it allows to get a significant performance gain when analyzing PIV experiments by using NVIDIA GPUs. Implementation in pure Python allows the module to serve as a backend for more complex experimental data processing systems. The core library of this method is one of the most reliable and widespread in the field of machine learning.

Read full abstract

Approximations based on Chebyshev polynomials have several astrodynamic applications. The performance of these approximations can be improved by parallel implementations exploiting parallel architectures, such as OpenMP and CUDA. In this paper, we introduce the parallel implementation to two astrodynamic applications. The first is the gravitational finite element model (FEM): a piecewise Chebyshev approximation that replaces high degree and order gravitational spherical harmonic models (SHMs). Thus, much lower degree, locally valid functions can efficiently model and compute local gravity perturbations in parallel structure for efficient performance. For this model, the total gravity acceleration is split into a reference and disturbance term. The reference includes two-body plus J_2, which are relatively cheap to compute. The FEM approximates the higher-order gravity terms. It is developed from a 2D mesh grid covering a sphere of a specified radius, and a family of spherical shells is sampled using a cosine distribution in the radial direction. To reduce the required memory when seeking a specific accuracy, an adaptive version of the gravitational FEM is introduced. In addition, a parallel implementation of the FEM using OpenMP is preseneted. We show the runtime comparison for the 200 degree × 200 order EGM2008 SHM and the serial and parallel equivalent FEM algorithms. The other application is the Chebyshev-Picard method (CPM): a numerical integrator that solves an ordinary differential equation by approximating the integrand using a Chebyshev approximant and iterates over the trajectory via Picard iteration. A parallel CUDA implementation of the CPM method in conjunction with the EGM2008 SHM and the FEM is introduced. We present numerical examples for propagating four Earth-orbiting satellites considering both the 200times 200 EGM2008 SHM and the equivalent FEM representation to test the algorithm’s performance via parallel and serial computation (i.e., a single CPU thread).

Read full abstract

CUDA Implementation Research Articles

Related Topics

Articles published on CUDA Implementation

End-to-End Deployment of Winograd-Based DNNs on Edge GPU

CUDA acceleration of MI-based feature selection methods

Parallel Implementation of Lightweight Secure Hash Algorithm on CPU and GPU Environments

An Efficient Explicit Moving Particle Simulation Solver for Simulating Free Surface Flow on Multicore CPU/GPUs

Efficient GPU implementation of randomized SVD and its applications

CunuSHT: GPU accelerated spherical harmonic transforms on arbitrary pixelizations

Optimized CUDA Implementation to Improve the Performance of Bundle Adjustment Algorithm on GPUs

High level GPU-accelerated 2D PIV framework in Python

An Evaluation of Directive-Based Parallelization on the GPU Using a Parboil Benchmark

Real-Time Ego-Lane Detection in a Low-Cost Embedded Platform using CUDA-Based Implementation

CATE: A fast and scalable CUDA implementation to conduct highly parallelized evolutionary tests on large scale genomic data

Exploring the performance and portability of the k-means algorithm on SYCL across CPU and GPU architectures

CUDA implementation of the antlion optimization algorithm

Irregular alignment of arbitrarily long DNA sequences on GPU

Outperforming Sequential Full-Word Long Addition With Parallelization and Vectorization

Efficient parallel implementation of crowd simulation using a hybrid CPU+GPU high performance computing system

Improving the Speed and Quality of Parallel Graph Coloring

BitCracker: BitLocker meets GPUs

Parallel Evaluation of Chebyshev Approximations: Applications in Astrodynamics

Path integral radiative transfer via polyline representation allowing GPU implementation

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

CUDA Implementation Research Articles

Related Topics

Articles published on CUDA Implementation

End-to-End Deployment of Winograd-Based DNNs on Edge GPU

CUDA acceleration of MI-based feature selection methods

Parallel Implementation of Lightweight Secure Hash Algorithm on CPU and GPU Environments

An Efficient Explicit Moving Particle Simulation Solver for Simulating Free Surface Flow on Multicore CPU/GPUs

Efficient GPU implementation of randomized SVD and its applications

CunuSHT: GPU accelerated spherical harmonic transforms on arbitrary pixelizations

Optimized CUDA Implementation to Improve the Performance of Bundle Adjustment Algorithm on GPUs

High level GPU-accelerated 2D PIV framework in Python

An Evaluation of Directive-Based Parallelization on the GPU Using a Parboil Benchmark

Real-Time Ego-Lane Detection in a Low-Cost Embedded Platform using CUDA-Based Implementation

CATE: A fast and scalable CUDA implementation to conduct highly parallelized evolutionary tests on large scale genomic data

Exploring the performance and portability of the k-means algorithm on SYCL across CPU and GPU architectures

CUDA implementation of the antlion optimization algorithm

Irregular alignment of arbitrarily long DNA sequences on GPU

Outperforming Sequential Full-Word Long Addition With Parallelization and Vectorization

Efficient parallel implementation of crowd simulation using a hybrid CPU+GPU high performance computing system

Improving the Speed and Quality of Parallel Graph Coloring

BitCracker: BitLocker meets GPUs

Parallel Evaluation of Chebyshev Approximations: Applications in Astrodynamics

Path integral radiative transfer via polyline representation allowing GPU implementation