Abstract

Abstract. This paper describes an efficient implementation of the semi-global matching (SGM) algorithm on multi-core processors that allows a nearly arbitrary number of path directions for the cost aggregation stage. The scanlines for each orientation are discretized iteratively once, and the regular substructures of the obtained template are reused and shifted to concurrently sum up the path cost in at most two sweeps per direction over the disparity space image. Since path overlaps do not occur at any time, no expensive thread synchronization will be needed. To further reduce the runtime on high counts of path directions, pixel-wise disparity gating is applied, and both the cost function and disparity loop of SGM are optimized using current single instruction multiple data (SIMD) intrinsics for two major CPU architectures. Performance evaluation of the proposed implementation on synthetic ground truth reveals a reduced height error if the number of aggregation directions is significantly increased or when the paths start with an angular offset. Overall runtime shows a speedup that is nearly linear to the number of available processors.

Highlights

  • Solving the correspondence problem is fundamental to photogrammetry as it will allow the reconstruction of 3D points from 2D bitmaps if the underlying camera geometry is known

  • This paper describes an implementation of semi-global matching (SGM) that supports a nearly arbitrary positive number of path directions for the cost aggregation stage

  • The expensive cost calculation and aggregation stages of the proposed algorithm are parallelized using single instruction multiple data (SIMD) commands for the omnipresent Intel x86-64 and ARM CPU architectures. The performance regarding both the runtime and disparity map quality of the new SGM implementation is evaluated on a synthetic 3D scene which gets rendered into error-free oriented pinhole camera images and depth ground truth

Read more

Summary

INTRODUCTION

Solving the correspondence problem is fundamental to photogrammetry as it will allow the reconstruction of 3D points from 2D bitmaps if the underlying camera geometry is known. The tSGM algorithm of (Rothermel, 2016) reduces both the number of calculations and the amount of memory required for the cost aggregation stage of SGM This is achieved through dynamic bounds on the disparity ranges for each image pixel. The expensive cost calculation and aggregation stages of the proposed algorithm are parallelized using single instruction multiple data (SIMD) commands for the omnipresent Intel x86-64 and ARM CPU architectures. The performance regarding both the runtime and disparity map quality of the new SGM implementation is evaluated on a synthetic 3D scene which gets rendered into error-free oriented pinhole camera images and depth ground truth. Extensive optimization of the SGM penalty functions or disparity map refinement techniques remain unconsidered, these may complement the proposed aggregation scheme

REVIEW OF THE BASELINE SGM ALGORITHM
SGM WITH AN ARBITRARY DIRECTION COUNT
Primary sweep
IMPLEMENTATION AND OPTIMIZATION
Secondary sweep
Disparity gating
Multithreading
Matching quality
Runtime and scalability
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call