Abstract

Unsupervised learning methods have achieved remarkable performance in monocular depth estimation and camera pose, which mostly solve the multi-task learning problem by using their inner geometry consistency as the self-supervision signal. While most existing approaches mostly adopt the generative model to obtain the depth map prediction, so in the resolution of depth map there is room for improvement. To this end, we present our unsupervised learning architecture based on adversarial learning model, which is used for unsupervised learning of high-resolution single view depth and camera pose. Specifically, we present a multi-scale deep convolutional Generative Adversarial Network (GAN) based learning system, which consists of three networks (pose estimation network PCNN, Generator-D and Discriminator-D for depth map prediction). Furthermore, in order to generate high-resolution depth map, we propose a multi-scale GAN model (MSGAN) to decompose the hard high-quality image generation problem into more manageable sub-problems through a coarse-to-fine process. Then, we modify the overall generation architecture of GAN model by changing the down-sampling and up-sampling components to improve the quality and accuracy of the depth map prediction. Finally, in order to improve the rate of convergence, we use the Least Square Error to increase the penalty for outliers. Detailed quantitative and qualitative evaluations of the proposed framework on the KITTI dataset show that the proposed method provides better results for both pose estimation and depth recovery.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call