Abstract

Predicting depth from an image is an essential problem in the area of scene understanding and deep learning shows great potential in this area. Unsupervised methods regard image reconstruction loss as the supervised information which shows a great potential of application. Most methods face the problem that their depth estimation results are not accurate enough for detailed autopilot scenarios, which will limit its application. These methods are often based on the fully convolutional neural network, which is the most commonly used network in the image-to-image field. In this paper, aiming at optimizing the depth estimation network architecture, we propose two networks that fuse the features of different encoding layers for monocular depth estimation, a multilayer information fusion U-Net (FU-Net) and a more lightweight one (LFU-Net). In order to improve the efficiency of feature fusion, we propose a hybrid attention mechanism to optimize our network and named it as AgFU-Net. We compare our networks with other improvements of U-Net, and the result shows that our networks are more efficient. Also, the loss function is fine-tuned for the unsupervised depth estimation algorithm. Our improvements achieve results that are comparable with state-of-the-art unsupervised monocular depth prediction methods on the KITTI benchmarks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call