Abstract

Depth estimation is an essential component in understanding the 3D geometry of a scene. In comparison to traditional depth estimation methods such as structure from motion and stereo vision matching, determining depth relation using a single camera is challenging. The recent advancements in convolutional neural networks have accelerated the research in monocular depth estimation. However, most technologies infer depth maps using lower resolution images due to network capacity and complexity issues. Another challenge in depth estimation is ambiguous and sparse depth maps. These issues are caused due to labeling errors, hardware faults, or occlusions. This paper presents a novel end-to-end trainable convolutional neural network architecture – depth transverse transformer network (DTTNet). The proposed network is designed and optimized to perform monocular depth estimation. This network aims at exploring the multi-resolution representation to perform pixel-wise depth estimation more accurately. In order to improve the accuracy of depth estimation, different kinds of ad hoc networks are proposed subsequently. Extensive computer simulations on NYU Depth V2 and SUN RGB-D dataset demonstrate the effectiveness of the proposed DTTNet against state-of-the-art methods. DTTNet can potentially optimize depth perception in intelligent systems such as automated driving and video surveillance applications, computational photography, and augmented reality. The source code is available at https://github.com/shreyaskamathkm/DTTNet

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call