Abstract

Binocular disparity and motion parallax are the most important cues for depth estimation in human and computer vision. Here, we present an experimental study to evaluate the accuracy of these two cues in depth estimation to stationary objects in a static environment. Depth estimation via binocular disparity is most commonly implemented using stereo vision, which uses images from two or more cameras to triangulate and estimate distances. We use a commercial stereo camera mounted on a wheeled robot to create a depth map of the environment. The sequence of images obtained by one of these two cameras as well as the camera motion parameters serve as the input to our motion parallax-based depth estimation algorithm. The measured camera motion parameters include translational and angular velocities. Reference distance to the tracked features is provided by a LiDAR. Overall, our results show that at short distances stereo vision is more accurate, but at large distances the combination of parallax and camera motion provide better depth estimation. Therefore, by combining the two cues, one obtains depth estimation with greater range than is possible using either cue individually.

Highlights

  • The human visual system relies on several different cues that provide depth information in static and dynamic environments: binocular disparity, motion parallax, kinetic depth effect, looming, perspective cues from linear image elements, occlusion, smooth shading, blur, etc

  • In the experiments with lower resolution we considered three cases: two cases with favorable geometry for motion parallax with the features far from the focus of expansion and one case with poor geometry with the features closed to the focus of expansion

  • The performance of stereo camera in depth estimation depends on the following factors; distance and angle to the point features, texture, and camera resolution

Read more

Summary

Introduction

The human visual system relies on several different cues that provide depth information in static and dynamic environments: binocular disparity, motion parallax, kinetic depth effect, looming, perspective cues from linear image elements, occlusion, smooth shading, blur, etc. Information from multiple cues is combined to provide the viewer with a unified estimate of depth [1]. In this combination, the cues are weighted dynamically depending on the scene, observer motion, lighting conditions, etc. Computer vision approaches that take into account combination of multiple cues can be implemented using semi-supervised deep neural networks [2]. In this approach the depth of each pixel in an image is directly predicted based on models that have been trained offline on large collections of ground truth depth data. Practical implementations usually incorporate monocular cues into a stereo system

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call