Ground-truth Depth Maps Research Articles

Deep learning approaches have significantly contributed to recent progress in stereo matching. These deep stereo matching methods are usually based on supervised training, which requires a large amount of high-quality ground-truth depth map annotations that are expensive to collect. Furthermore, only a limited quantity of stereo vision training data are currently available, obtained either by active sensors (Lidar and ToF cameras) or through computer graphics simulations and not meeting requirements for deep supervised training. Here, we propose a novel deep stereo approach called the "self-supervised multiscale adversarial regression network (SMAR-Net)," which relaxes the need for ground-truth depth maps for training. Specifically, we design a two-stage network. The first stage is a disparity regressor, in which a regression network estimates disparity values from stacked stereo image pairs. Stereo image stacking method is a novel contribution as it not only contains the spatial appearances of stereo images but also implies matching correspondences with different disparity values. In the second stage, a synthetic left image is generated based on the left-right consistency assumption. Our network is trained by minimizing a hybrid loss function composed of a content loss and an adversarial loss. The content loss minimizes the average warping error between the synthetic images and the real ones. In contrast to the generative adversarial loss, our proposed adversarial loss penalizes mismatches using multiscale features. This constrains the synthetic image and real image as being pixelwise identical instead of just belonging to the same distribution. Furthermore, the combined utilization of multiscale feature extraction in both the content loss and adversarial loss further improves the adaptability of SMAR-Net in ill-posed regions. Experiments on multiple benchmark datasets show that SMAR-Net outperforms the current state-of-the-art self-supervised methods and achieves comparable outcomes to supervised methods. The source code can be accessed at: https://github.com/Dawnstar8411/SMAR-Net.

To take full advantage of the information of images captured by drones and given that most existing monocular depth estimation methods based on supervised learning require vast quantities of corresponding ground truth depth data for training, the model of unsupervised monocular depth estimation based on residual neural network of coarse–refined feature extractions for drone is therefore proposed. As a virtual camera is introduced through a deep residual convolution neural network based on coarse–refined feature extractions inspired by the principle of binocular depth estimation, the unsupervised monocular depth estimation has become an image reconstruction problem. To improve the performance of our model for monocular depth estimation, the following innovations are proposed. First, the pyramid processing for input image is proposed to build the topological relationship between the resolution of input image and the depth of input image, which can improve the sensitivity of depth information from a single image and reduce the impact of input image resolution on depth estimation. Second, the residual neural network of coarse–refined feature extractions for corresponding image reconstruction is designed to improve the accuracy of feature extraction and solve the contradiction between the calculation time and the numbers of network layers. In addition, to predict high detail output depth maps, the long skip connections between corresponding layers in the neural network of coarse feature extractions and deconvolution neural network of refined feature extractions are designed. Third, the loss of corresponding image reconstruction based on the structural similarity index (SSIM), the loss of approximate disparity smoothness and the loss of depth map are united as a novel training loss to better train our model. The experimental results show that our model has superior performance on the KITTI dataset composed by corresponding left view and right view and Make3D dataset composed by image and corresponding ground truth depth map compared to the state-of-the-art monocular depth estimation methods and basically meet the requirements for depth information of images captured by drones when our model is trained on KITTI.

Ground-truth Depth Maps Research Articles

Articles published on Ground-truth Depth Maps

CSRNet: Focusing on critical points for depth completion

MosaicMVS: Mosaic-Based Omnidirectional Multi-View Stereo for Indoor Scenes

Scale-preserving shape reconstruction from monocular endoscope image sequences by supervised depth learning.

A Method for Training Object Scale Estimation System using Feature Extraction Enhancement with Depth Estimation

Improving completeness and accuracy of 3D point clouds by using deep learning for applications of digital twins to civil structures

A Robust Light-Weight Fused-Feature Encoder-Decoder Model for Monocular Facial Depth Estimation From Single Images Trained on Synthetic Data

Monocular Depth Estimation of Old Photos via Collaboration of Monocular and Stereo Networks

Self-supervised Monocular Depth Estimation with 3D Displacement Module for Laparoscopic Images.

Self-supervised recurrent depth estimation with attention mechanisms.

Occlusion-Aware Unsupervised Learning of Depth From 4-D Light Fields.

Depth measurement based on a convolutional neural network and structured light

Extraction of Key-Frames From Endoscopic Videos by Using Depth Information

PASMVS: A perfectly accurate, synthetic, path-traced dataset featuring specular material properties for multi-view stereopsis training and reconstruction applications.

Self-Supervised Multiscale Adversarial Regression Network for Stereo Disparity Estimation.

End-to-End Learning for Omnidirectional Stereo Matching With Uncertainty Prior.

Unsupervised Monocular Depth Estimation Based on Residual Neural Network of Coarse–Refined Feature Extractions for Drone

Real-time Depth Estimation Using Recurrent CNN with Sparse Depth Cues for SLAM System

DEPTH MAP ESTIMATION IN LIGHT FIELDS USING AN STEREO-LIKE TAXONOMY

3-D Depth Reconstruction from a Single Still Image

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Ground-truth Depth Maps Research Articles

Articles published on Ground-truth Depth Maps

CSRNet: Focusing on critical points for depth completion

MosaicMVS: Mosaic-Based Omnidirectional Multi-View Stereo for Indoor Scenes

Scale-preserving shape reconstruction from monocular endoscope image sequences by supervised depth learning.

A Method for Training Object Scale Estimation System using Feature Extraction Enhancement with Depth Estimation

Improving completeness and accuracy of 3D point clouds by using deep learning for applications of digital twins to civil structures

A Robust Light-Weight Fused-Feature Encoder-Decoder Model for Monocular Facial Depth Estimation From Single Images Trained on Synthetic Data

Monocular Depth Estimation of Old Photos via Collaboration of Monocular and Stereo Networks

Self-supervised Monocular Depth Estimation with 3D Displacement Module for Laparoscopic Images.

Self-supervised recurrent depth estimation with attention mechanisms.

Occlusion-Aware Unsupervised Learning of Depth From 4-D Light Fields.

Depth measurement based on a convolutional neural network and structured light

Extraction of Key-Frames From Endoscopic Videos by Using Depth Information

PASMVS: A perfectly accurate, synthetic, path-traced dataset featuring specular material properties for multi-view stereopsis training and reconstruction applications.

Self-Supervised Multiscale Adversarial Regression Network for Stereo Disparity Estimation.

End-to-End Learning for Omnidirectional Stereo Matching With Uncertainty Prior.

Unsupervised Monocular Depth Estimation Based on Residual Neural Network of Coarse–Refined Feature Extractions for Drone

Real-time Depth Estimation Using Recurrent CNN with Sparse Depth Cues for SLAM System

DEPTH MAP ESTIMATION IN LIGHT FIELDS USING AN STEREO-LIKE TAXONOMY

3-D Depth Reconstruction from a Single Still Image