Abstract

AbstractMajority of the existing deep learning based depth estimation approaches employed for finding depth from monocular image need very accurate ground truth depth information to train a supervised decision framework. However, it is not always possible to get an accurate depth information particularly for diverse outdoor scenes. To address this, a convolutional network architecture is proposed, which comprises of two encoder-decoders for utilizing stereo matching criterion for training. The image reconstruction error measure is employed for optimization of network parameters instead of ground truth depth information. To estimate an accurate disparity map in low textured and occluded regions, a cross based cost-aggregation loss term is proposed along with a novel occlusion detection and filling method in the post-processing stage. The proposed method achieves an improvement of 6.219% RMS error for a depth cap of 80m and 6.15% RMS error for a depth cap of 50 m among the unsupervised approaches on KITTI 2015 dataset. The importance of channel-wise descriptor for training a deep neural network is also established through the performance measure.KeywordsStereo matchingDisparity mapOcclusionFully Convolutional Network (FCN)

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.