Self-supervised Monocular Depth Estimation Research Articles

Self-supervised learning methods are increasingly important for monocular depth estimation since they do not require ground-truth data during training. Although existing methods have achieved great success for better monocular depth estimation based on Convolutional Neural Networks (CNNs), the limited receptive field of CNNs usually is insufficient to effectively model the global information, e.g., relationship between foreground and background or relationship among objects, which are crucial for accurately capturing scene structure. Recently, some studies based on Transformers have attracted significant interest in computer vision. However, duo to the lack of spatial locality bias, they may fail to model the local information, e.g., fine-grained details with an image. To tackle these issues, we propose a novel self-supervised learning framework by incorporating the advantages of both the CNNs and Transformers so as to model the complete contextual information for high-quality monocular depth estimation. Specifically, the proposed method mainly includes two branches, where the Transformer branch is considered to capture the global information while the Convolution branch is exploited to preserve the local information. We also design a rectangle convolution module with pyramid structure to perceive the semi-global information, e.g. thin objects, along the horizontal and vertical directions within an image. Moreover, we propose a shape refinement module by learning the affinity matrix between pixel and its neighborhood to obtain accurate geometrical structure of scenes. Extensive experiments evaluated on KITTI, Cityscapes and Make3D dataset demonstrate that the proposed method achieves the competitive result compared with the state-of-the-art self-supervised monocular depth estimation methods and shows good cross-dataset generalization ability.

Read full abstract

Self-supervised monocular depth estimation methods have been increasingly given much attention due to the benefit of not requiring large, labelled datasets. Such self-supervised methods require high-quality salient features and consequently suffer from severe performance drop for indoor scenes, where low-textured regions dominant in the scenes are almost indiscriminative. To address the issue, we propose a self-supervised indoor monocular depth estimation framework called F2Depth. A self-supervised optical flow estimation network is introduced to supervise depth learning. To improve optical flow estimation performance in low-textured areas, only some patches of points with more discriminative features are adopted for finetuning based on our well-designed patch-based photometric loss. The finetuned optical flow estimation network generates high-accuracy optical flow as a supervisory signal for depth estimation. Correspondingly, an optical flow consistency loss is designed. Multi-scale feature maps produced by finetuned optical flow estimation network perform warping to compute feature map synthesis loss as another supervisory signal for depth learning. Experimental results on the NYU Depth V2 dataset demonstrate the effectiveness of the framework and our proposed losses. To evaluate the generalization ability of our F2Depth, we collect a Campus Indoor depth dataset composed of approximately 1500 points selected from 99 images in 18 scenes. Zero-shot generalization experiments on 7-Scenes dataset and Campus Indoor achieve δ1 accuracy of 75.8% and 76.0% respectively. The accuracy results show that our model can generalize well to monocular images captured in unknown indoor scenes.

Read full abstract

Self-supervised Monocular Depth Estimation Research Articles

Articles published on Self-supervised Monocular Depth Estimation

Shufflemono: Rethinking Lightweight Network for Self-Supervised Monocular Depth Estimation

Complete contextual information extraction for self-supervised monocular depth estimation

F[formula omitted]Depth: Self-supervised indoor monocular depth estimation via optical flow consistency and feature map synthesis

Self-supervised monocular depth estimation on water scenes via specular reflection prior

TAMDepth: self-supervised monocular depth estimation with transformer and adapter modulation

AltNeRF: Learning Robust Neural Radiance Field via Alternating Depth-Pose Optimization

PPEA-Depth: Progressive Parameter-Efficient Adaptation for Self-Supervised Monocular Depth Estimation

SQLdepth: Generalizable Self-Supervised Fine-Structured Monocular Depth Estimation

On Robust Cross-view Consistency in Self-supervised Monocular Depth Estimation

A Foggy Weather Simulation Algorithm for Traffic Image Synthesis Based on Monocular Depth Estimation.

Bridging local and global representations for self-supervised monocular depth estimation

Self-Supervised Monocular Depth Estimation Based on High-Order Spatial Interactions

Dual-attention-based semantic-aware self-supervised monocular depth estimation

IterDepth: Iterative Residual Refinement for Outdoor Self-Supervised Multi-Frame Monocular Depth Estimation

Learn to Adapt for Self-Supervised Monocular Depth Estimation.

Self-Supervised Monocular Depth Estimation With Self-Perceptual Anomaly Handling.

SC-DepthV3: Robust Self-Supervised Monocular Depth Estimation for Dynamic Scenes.

Self-Supervised Monocular Depth Estimation from Videos via Adaptive Reconstruction Constraints

Self-Supervised Monocular Depth Estimation With Positional Shift Depth Variance and Adaptive Disparity Quantization.

Spatial-Aware Dynamic Lightweight Self-Supervised Monocular Depth Estimation

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Self-supervised Monocular Depth Estimation Research Articles

Articles published on Self-supervised Monocular Depth Estimation

Shufflemono: Rethinking Lightweight Network for Self-Supervised Monocular Depth Estimation

Complete contextual information extraction for self-supervised monocular depth estimation

F[formula omitted]Depth: Self-supervised indoor monocular depth estimation via optical flow consistency and feature map synthesis

Self-supervised monocular depth estimation on water scenes via specular reflection prior

TAMDepth: self-supervised monocular depth estimation with transformer and adapter modulation

AltNeRF: Learning Robust Neural Radiance Field via Alternating Depth-Pose Optimization

PPEA-Depth: Progressive Parameter-Efficient Adaptation for Self-Supervised Monocular Depth Estimation

SQLdepth: Generalizable Self-Supervised Fine-Structured Monocular Depth Estimation

On Robust Cross-view Consistency in Self-supervised Monocular Depth Estimation

A Foggy Weather Simulation Algorithm for Traffic Image Synthesis Based on Monocular Depth Estimation.

Bridging local and global representations for self-supervised monocular depth estimation

Self-Supervised Monocular Depth Estimation Based on High-Order Spatial Interactions

Dual-attention-based semantic-aware self-supervised monocular depth estimation

IterDepth: Iterative Residual Refinement for Outdoor Self-Supervised Multi-Frame Monocular Depth Estimation

Learn to Adapt for Self-Supervised Monocular Depth Estimation.

Self-Supervised Monocular Depth Estimation With Self-Perceptual Anomaly Handling.

SC-DepthV3: Robust Self-Supervised Monocular Depth Estimation for Dynamic Scenes.

Self-Supervised Monocular Depth Estimation from Videos via Adaptive Reconstruction Constraints

Self-Supervised Monocular Depth Estimation With Positional Shift Depth Variance and Adaptive Disparity Quantization.

Spatial-Aware Dynamic Lightweight Self-Supervised Monocular Depth Estimation