Lightweight Monocular Depth Estimation on Edge Devices

Siping Liu,Xiaohan Tu,Laurence Tianruo Yang,Renfa Li,Cheng Xu

doi:10.1109/jiot.2022.3151374

Abstract

Given monocular images as inputs, monocular depth estimation (MDE) infers pixel-level depth. MDE is always a critical stage in scene sensing on edge devices. Existing MDE studies frequently employ deep neural networks (DNNs) for MDE, but they still face some problems, such as sacrificing computational complexity and efficiency in return for great precision, or losing more precision in exchange for increased efficiency. To alleviate these issues; 1) we propose an encoder–decoder network (EdgeNet) for precise and fast MDE on different edge devices. When recovering depth in the decoder, we design upsampling modules to aggregate global depth information with low computational complexity, improving the accuracy of the decoder by extracting its different ranges of depth information; 2) we develop a two-stage channel pruning method to, respectively, prune the encoder and decoder based on their characteristics. Our pruning method further reduces latency and model/computational complexity of EdgeNet, while losing little accuracy; and 3) we optimize the pruned EdgeNet to decrease graphics processing unit (GPU) scheduling overhead. The optimization accelerates MDE inference by an order of magnitude on the TX2 GPU device, when the input resolution is 224 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\times $ </tex-math></inline-formula> 224. Extensive experiments show that our strategies are effective on different edge GPU devices, when input resolutions differ in outdoor or indoor scenes. For example, compared with the state of the art, the optimized EdgeNet, respectively, reduces the GPU latency by 76.3% and 89.2% on Nano and TX2 GPU devices with 2.6% lower root mean square error when the input resolution is 128 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\times $ </tex-math></inline-formula> 416.

Full Text