Abstract

Abstract. Depth is an essential component for various scene understanding tasks and for reconstructing the 3D geometry of the scene. Estimating depth from stereo images requires multiple views of the same scene to be captured which is often not possible when exploring new environments with a UAV. To overcome this monocular depth estimation has been a topic of interest with the recent advancements in computer vision and deep learning techniques. This research has been widely focused on indoor scenes or outdoor scenes captured at ground level. Single image depth estimation from aerial images has been limited due to additional complexities arising from increased camera distance, wider area coverage with lots of occlusions. A new aerial image dataset is prepared specifically for this purpose combining Unmanned Aerial Vehicles (UAV) images covering different regions, features and point of views. The single image depth estimation is based on image reconstruction techniques which uses stereo images for learning to estimate depth from single images. Among the various available models for ground-level single image depth estimation, two models, 1) a Convolutional Neural Network (CNN) and 2) a Generative Adversarial model (GAN) are used to learn depth from aerial images from UAVs. These models generate pixel-wise disparity images which could be converted into depth information. The generated disparity maps from these models are evaluated for its internal quality using various error metrics. The results show higher disparity ranges with smoother images generated by CNN model and sharper images with lesser disparity range generated by GAN model. The produced disparity images are converted to depth information and compared with point clouds obtained using Pix4D. It is found that the CNN model performs better than GAN and produces depth similar to that of Pix4D. This comparison helps in streamlining the efforts to produce depth from a single aerial image.

Highlights

  • Depth is an important component for understanding 3D geometrical information of objects from a 2D scene

  • 4.1.1 Internal quality assessment: To assess the performance of the model in reproducing what it has learnt during training, the models are tested as an initial assessment with images from training dataset

  • The disparity learnt by the model during the training stage at the last epoch is compared with the disparity generated during testing for the same image

Read more

Summary

Introduction

Depth is an important component for understanding 3D geometrical information of objects from a 2D scene. Deep learning has wide range of applications in scene understanding, segmentation, classification and depth estimation tasks (Luo et al, 2018). This successful performance of deep learning techniques in extracting high level features and its applications, makes it a preferable tool for single image depth estimation (Amirkolaee and Arefi, 2019). There are multiple approaches through which single image depth estimation can be achieved This includes supervised learning where the models are trained with ground truth depths (Eigen et al, 2014; Liu et al, 2016; Laina et al, 2016; Li et al, 2017; Mou and Zhu, 2018; Amirkolaee and Arefi, 2019). The collection of ground truth depths is a time consuming, expensive and difficult

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call