Abstract

Depth-estimation from a single input image can be used in applications such as robotics and autonomous driving. Recently, depth-estimation networks with UNet encoder/decoder structures have been widely used. In these decoders, operations are repeated to gradually increase the image resolution, while decreasing the channel size. If the upsampling operation at a high magnification can be processed at once, the amount of computation in the decoder can be dramatically reduced. To achieve this, we propose a new network structure, i.e., a cocktail glass network. In this network, convolution layers in the decoder are reduced, and a novel fast upsampling method is used that is known as channel-to-space unrolling, which converts thick channel data into high-resolution data. The proposed method can be easily implemented using simple reshaping operations; therefore, it is suitable for reducing the depth-estimation network. Considering the experimental results based on the NYU V2 and KITTI datasets, we demonstrate that the proposed method reduces the amount of computation in the decoder by half, while maintaining the same level of accuracy; it can be used in both lightweight and large-model-capacity networks.

Highlights

  • M ONOCULAR depth-estimation (MDE) is a method of estimating the depth of an input image

  • In this study, we propose a modified UNet-style depthestimation network, which is known as cocktail glass network (CGN), and a novel fast upsampling method, which is referred to as channelto-space unrolling (CSR)

  • The structural modification in CGN focuses on reducing the number of convolution layers in the decoder

Read more

Summary

Introduction

M ONOCULAR depth-estimation (MDE) is a method of estimating the depth of an input image. ResNet [1] or DenseNet [2] is used as a backbone in heavy-weight networks to achieve the best accuracy, and MobileNet [3] is primarily used in lightweight networks for mobile environments. Various other methods, such as GhostNet [4], FbNet [5], and EfficientNet [6], have recently been proposed for efficient computation and they can be used as an encoder in a UNet-styled MDE network

Objectives
Methods
Findings
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.