Abstract
AbstractDigital Image Correlation (DIC) is a popular non-contact image-based full-field deformation measurement tool widely used in mechanics. In spite of its significant advantages, it is still primarily used as a post-processing tool due to its computational cost. In recent years, parallel computing platforms such as multi-core processors and Graphics Processing Units (GPUs) have been used to improve the speed of the DIC algorithm, with GPUs being well-suited for implementing data-parallel operations. Previous works have performed GPU-based DIC wherein each sub-image (i.e. a collection of a few pixels in the local neighborhood of a point of interest) is allocated to a single thread on the GPU, thus achieving parallelism across sub-images. However, this is not the only type of parallelism that is possible: one can also achieve parallelism within a sub-image as well as across whole images. The aim of this work is to efficiently implement 2D-DIC such that parallelism within a sub-image as well as across sub-images leads to considerable reduction in computation time. We use a heterogeneous framework consisting of an Intel Xeon octa-core CPU and an Nvidia Tesla K20C GPU card in this work. The CPU is used to handle image pre-processing, whereas the GPU is used to process four compute-intensive tasks: affine shape function computation, B-Spline interpolation, residual vector calculation and deformation vector update. Parallelization within and across sub-images is achieved in this work by efficient thread handling and use of pre-compiled BLAS libraries. In order to estimate the speedup provided by the GPU, the same four tasks were also evaluated on the octa-core CPU; a speedup of approximately 7 to 5 times was observed for a single sub-image whose size varies from 21×21 to 61×61 respectively. However, it is expected that for a larger number of sub-images, the GPU speedup will be higher and this is indeed the case: when the affine shape function computation and B-Spline interpolation steps were evaluated on 1869 21×21 pixel sub-images, the speedup was around a more impressive 453 times. Further GPU optimization as well as parallelization across image pairs is currently underway and even faster GPU-assisted DIC seems achievable.KeywordsFull-field displacementSub-imageParallel computingHeterogeneous frameworkCompute Unified Device Architecture (CUDA)ThreadKernel
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.