Accelerating stereo vision algorithm using SSE3, AVX2, and CUDA

M Kokhazadeh,M Dehyadegari,M Daneshtalab,Z Kokhazad

doi:10.1109/iraniancee.2017.7985426

Abstract

Stereo vision features a widespread usage such as robotics, unmanned cars, aerial surveys, and many real-time applications. Also, it needs computational expensive calculations because of stereo matching. In real time applications, the execution time of stereo vision depth detection algorithm is very important. This paper studies the Intel SIMD instructions and CUDA effects on reducing the execution time of the stereo vision. CUDA and SIMD instructions improve performance by exploiting data level parallelism. We present a fast implementation of SSD stereo vision algorithm on Intel processors using SIMD instruction sets (SSE3 and AVX2) and NVidia Graphics Processing Unit (GPU) using CUDA language and compare their results with serial implementation. The algorithm applied to different ranges of disparity (from 16 to 256), window size (from 3×3 to 15×15) and image resolution (from 256×212 to 1408×1168) parameters. We achieved 182 frames per second rate for the disparity of 64 and window size of 3×3 in CUDA, 64 frames per second rate in AVX2 and 25 frames per second rate in SSE3. Experimental results show that we can get speedup up to 5× in SSE3, 10× in AVX2 and 21× in CUDA compared to serial implementation.

Full Text