Abstract

Monocular depth estimation (MDE) predicts pixel-level depth from a single image and plays a vital role in image sensing. MDE has made progress due to the usage of deep neural networks (DNNs). However, current MDE methods fail to provide satisfactory depth, as they rarely modeled dependencies among convolution channels and ignored location relationships in DNNs. Additionally, they are commonly slow for inference on embedded devices due to high computation complexity. To tackle these problems, we propose a novel encoder–decoder network (EDNet) for fast MDE inference on diverse embedded devices. Specifically, (1) we design an encoder to re-explore new features, and then model nonlinear and dynamic dependencies among convolution channels based on an attention mechanism. (2) We propose a decoder containing four plug-and-play blocks to individually extract image features, model dependencies among convolution channels, learn location relationships, and adjust the channels. (3) We optimize EDNet with inference engines to match MDE with different embedded system architectures. Experiments confirm that our RMSE (root mean square error) is at least lower by 3.7% and 5.0% than that of state-of-the-art models on the NYU-Depth-v2 and KITTI datasets, respectively. The optimized EDNet simultaneously improves the accuracy, inference speed, and visualized results of MDE on different embedded devices.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.