Abstract

Monocular sensors depth prediction has received continuous attention in recent years because of its wide application in autonomous driving, intelligent system navigation and other fields. Convolutional neural networks have dominated monocular depth prediction for a long time, and the recent introduction of Transformer-based and MLP-based architectures in the field of computer vision has provided some new ideas for monocular depth prediction. However, they all have a series of problems such as high computational complexity and excessive parameters. In this paper, we propose MLP-Depth, which is a lightweight monocular depth prediction method based on hierarchical multi-stage MLP, and utilizes depth-wise convolution to improve local modeling capabilities and reduce parameters and computational costs. In addition, we also design a multi-scale inverse attention mechanism to implicitly improve the global expressiveness of MLP-Depth. Our method effectively reduces the number of parameters of monocular depth prediction network using transformer-like architectures, and extensive experiments show that MLP-Depth can achieve competitive results with fewer parameters in challenging outdoor and indoor datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call