Real-Time Monocular Depth Estimation Merging Vision Transformers on Edge Devices for AIoT

Xihao Liu,Wei Wei,Jun Li,Yuyang Peng,Jinhao Huang,Cheng Liu

doi:10.1109/tim.2023.3264039

Abstract

Depth estimation is requisite to build 3D perceiving capability of artificial intelligence of things (AIoT). Real-time inference with extremely low computing resource consumption is critical on edge devices. However, most single-view depth estimation networks focus on the improvement of accuracy running on high-end GPUs, which go opposite to the real-time requirement on edge devices. To address this issue, this article proposed a novel encoder-decoder network to realize real-time monocular depth estimation on edge devices. The proposed network merges semantic information at global field via an efficient transformer-based module to provide more details of the object for depth assignment. The transformer-based module is integrated in the lowest level resolution of an encoder-decoder architecture to largely reduce the parameters of the Vision Transformer (ViT). In particular, we proposed a novel patch convolutional layer for low-latency feature extraction in the encoder and a SConv5 layer for effective depth assignment in the decoder. The proposed network achieves outstanding balance between accuracy and speed on the NYU Depth v2 dataset. A low RMSE of 0.554 and a fast speed of 58.98 FPS on NVIDIA Jetson Nano device with TensorRT optimization are obtained on NYU Depth v2, outperforming most state-of-the-art real-time results.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Real-Time Monocular Depth Estimation Merging Vision Transformers on Edge Devices for AIoT

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Instrumentation and Measurement

Lead the way for us

Journal: IEEE Transactions on Instrumentation and Measurement	Publication Date: Jan 1, 2023
Citations: 5

Similar Papers

Monocular Depth Estimation Using a Laplacian Image Pyramid with Local Planar Guidance Layers.
Youn-Ho Choi ... Seok-Cheol Kee
Sensors (Basel, Switzerland) | VOL. 23
Youn-Ho Choi, et. al.Youn-Ho Choi ... Seok-Cheol Kee
11 Jan 2023
Sensors (Basel, Switzerland) | VOL. 23

FastDepth: Fast Monocular Depth Estimation on Embedded Systems
Diana Wofk ... Tien-Ju Yang
-
Diana Wofk, et. al.Diana Wofk ... Tien-Ju Yang
01 May 2019
01 May 2019

DTTNet: Depth transverse transformer network for monocular depth estimation
Shreyas Kamath K.M ... Karen Panetta
-
Shreyas Kamath K.M, et. al.Shreyas Kamath K.M ... Karen Panetta
27 May 2022
27 May 2022

Indoor Scene Understanding with RGB-D Images: Bottom-up Segmentation, Object Detection and Semantic Segmentation
Saurabh Gupta ... Ross Girshick
International Journal of Computer Vision | VOL. 112
Saurabh Gupta, et. al.Saurabh Gupta ... Ross Girshick
21 Nov 2014
International Journal of Computer Vision | VOL. 112

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Real-Time Monocular Depth Estimation Merging Vision Transformers on Edge Devices for AIoT

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Instrumentation and Measurement