Abstract

Unsupervised monocular depth estimation with aggregating image features and wavelet SSIM (Structural SIMilarity) loss

Highlights

  • Predicting depth from a single 2D image is a fundamental task in computer vision

  • |I| is the number of pixels in image I. dipjred is the predicted depth from model. digjt is the depth ground truth. δt represents the threshold between the depth ground truth and the predicted depth, which is set to be 1.25, 1.252, and 1.253, respectively

  • Network capacity To show our proposed network can improve accuracy without increasing network capacity, the number of network parameters and the floating-point operations per second (FLOPs) for the network were computed to evaluate the capacity of the proposed network

Read more

Summary

Introduction

Predicting depth from a single 2D image is a fundamental task in computer vision. It has been studied for many years with widespread applications in reality, such as visual navigation[1], object tracking[2,3], and surgery[4]. Monocular depth estimation approaches can be classified into three categories: supervised[5,6,7,8,9], semi-supervised[10], and unsupervised[11,12,13,14,15,16,17,18,19]. Both supervised and semi-supervised learning rely on the image depth ground truth. A monocular dataset is more general as the input of network It needs to estimate the pose transformation between consecutive frames simultaneously. A pose estimation network is necessary that outputs relative 6-DoF pose with given sequences of frames as input

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call