Abstract

Semiglobal matching is an accurate stereo depth estimation algorithm, whereas implementing the high-throughput architecture has been challenging due to the inherent recursion on inter-pixel cost aggregation. Especially, the computation on horizontal scan pass is the critical path causing the throughput bottleneck. In this paper, we propose a new cluster-wise cost aggregation algorithm and its optimized architecture that enables to pipeline the inter-pixel aggregation and parallelize the scanline-level disparity computation. The proposed approach is performed not on every pixel but on each group of pixels, which significantly alleviates the timing constraint for the recursion. The disparity values at shifted multiple pixel positions are concurrently computed within a single clock period. We also propose the memory reduction scheme selecting a tiny number of informative values, which achieves 96% memory reduction compared to the straightforward approach storing overall values. The system-on-chip-based tiled processing scheme is employed, which allows the implementation without an external memory. The proposed architecture computes a depth map with 128 disparity levels at 103 frames per second on a full HD image on the Zynq ultrascale+ MPSoC platform, thus providing 2.6 times faster performance with a comparable accuracy compared to the state-of-the-art 8-path semiglobal matching implementation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call