Abstract

Lightweight neural networks that employ depthwise convolution have a significant computational advantage over those that use standard convolution because they involve fewer parameters; however, they also require more time, even with graphics processing units (GPUs). We propose a Repetition-Reduction Network (RRNet) in which the number of depthwise channels is large enough to reduce computation time while simultaneously being small enough to reduce GPU latency. RRNet also reduces power consumption and memory usage, not only in the encoder but also in the residual connections to the decoder. We apply RRNet to the problem of resource-constrained depth estimation, where it proves to be significantly more efficient than other methods in terms of energy consumption, memory usage, and computation. It has two key modules: the Repetition-Reduction (RR) block, which is a set of repeated lightweight convolutions that can be used for feature extraction in the encoder, and the Condensed Decoding Connection (CDC), which can replace the skip connection, delivering features to the decoder while significantly reducing the channel depth of the decoder layers. Experimental results on the KITTI dataset show that RRNet consumes 3.84x less energy and 3.06x less memory than conventional schemes, and that it is 2.21x faster on a commercial mobile GPU without increasing the demand on hardware resources relative to the baseline network. Furthermore, RRNet outperforms state-of-the-art lightweight models such as MobileNets, PyDNet, DiCENet, DABNet, and EfficientNet.

Highlights

  • Depth estimation is crucial for several computer vision applications

  • We propose a Repetition-Reduction Network (RRNet), which is an energy efficient encoder–decoder model based on RR blocks and Condensed Decoding Connection (CDC) that outperforms current state-of-the-art complex and lightweight models in terms of performance, run time, and energy consumption using practical mobile graphics processing units (GPUs) hardware

  • We have observed that depthwise convolution involves a small amount of computation, its GPU latency is higher than that of other convolution operations such as the 3×3 standard convolution and the pointwise convolution, which we have described in detail in the Bottleneck Part in Section III A

Read more

Summary

Introduction

Depth estimation is crucial for several computer vision applications. Many technological goals, including localization in augmented reality (AR) or virtual reality (VR), advanced robotics, the reliable operation of autonomous vehicles or drones, and smart factories, cannot be realized without accurate depth estimation. Deep learning approaches [1]–[9] convincingly outperform attempts to manually solve this problem [10], [11]. Their use in mobile applications that involve a lightweight neural network model and relatively low-end graphics processing units (GPUs) remains limited. As we will show in the subsequent sections of this paper, this can strongly affect performance

Objectives
Methods
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.