Abstract
Monocular depth estimation is a basic task in machine vision. In recent years, the performance of monocular depth estimation has been greatly improved. However, most depth estimation networks are based on a very deep network to extract features that lead to a large amount of information lost. The loss of object information is particularly serious in the encoding and decoding process. This information loss leads to the estimated depth maps lacking object structure detail and have non-clear edges. Especially in a complex indoor environment, which is our research focus in this paper, the consequences of this loss of information are particularly serious. To solve this problem, we propose a Dense feature fusion network that uses a feature pyramid to aggregate various scale features. Furthermore, to improve the fusion effectiveness of decoded object contour information and depth information, we propose an adaptive depth fusion module, which allows the fusion network to fuse various scale depth maps adaptively to increase object information in the predicted depth map. Unlike other work predicting depth maps relying on U-NET architecture, our depth map predicted by fusing multi-scale depth maps. These depth maps have their own characteristics. By fusing them, we can estimate depth maps that not only include accurate depth information but also have rich object contour and structure detail. Experiments indicate that the proposed model can predict depth maps with more object information than other prework, and our model also shows competitive accuracy. Furthermore, compared with other contemporary techniques, our method gets state-of-the-art in edge accuracy on the NYU Depth V2 dataset.
Highlights
Depth estimation is a fundamental problem in computer vision, applied to robot navigation, augmented reality, 3D reconstruction, autonomous driving, and other fields
Most models of depth estimation are based on very deep neural networks to extract the features from the image to get good performance, but the feature maps obtained by multiple convolutions lose many pieces of information especially object information, which leads to small objects and object structure detail missed in the feature map
U-NET gets good performance in many vision tasks, the gradual decoding makes U-NET shows poor performance in multi-scale feature fusion. To deal with those problems, we propose a network to estimate the depth from a single image by fuse multi-scale depth maps
Summary
Depth estimation is a fundamental problem in computer vision, applied to robot navigation, augmented reality, 3D reconstruction, autonomous driving, and other fields. Most models of depth estimation are based on very deep neural networks to extract the features from the image to get good performance, but the feature maps obtained by multiple convolutions lose many pieces of information especially object information, which leads to small objects and object structure detail missed in the feature map. For indoor scenes with many objects, the impact of information loss is serious To deal with this problem, some pre-works [7], [8] introduce the skip-connection to add Low-scale features to the decoder module. U-NET gets good performance in many vision tasks, the gradual decoding makes U-NET shows poor performance in multi-scale feature fusion To deal with those problems, we propose a network to estimate the depth from a single image by fuse multi-scale depth maps. By estimating coarse depth maps of various scales, and VOLUME 9, 2021 performing weighted summation on these depth maps, a depth map with both high accuracy and rich scene information is obtained. Extensive experimental results show that our model shows that our predicted depth map has more object information and clearer edges than other previous works, and has competitive depth accuracy in the NYU-Depth V2 dataset
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.