Abstract

Depth estimation can provide auxiliary information for scene perception. Generally, extensive textureless surfaces, such as walls and ceilings, exist in indoor environments, and they share similar scene and semantic content. Overly consistent features of local textureless areas fail to reflect changes in depth information, thus degrading the performance of existing depth estimation methods. In response to this challenge, we propose a special indoor depth estimation method, named as MMAIndoor, which can provide global semantic guidance and shape priors for local textureless depth estimation. The depth estimation network is designed efficiently, encompassing the initial convolutional stage and the latent patched multi-layer perceptron (Pat-MLP) stage. The novel Pat-MLP block utilizes MLP partitioning to globally model depth-local information from the convolutional stage and it incorporate axial shift operations to extract local information from various spatial locations, suppressing the smoothing effect of MLP and improving precise estimation of sharp depth changes or small structures indoors. Further, we build a multi-dimensional cross attention (MCA) module to address the weak correlation of the current residual connections for the global context. This MCA captures global dependencies across multi-dimensions by sequentially executing cross attention on both channels and spatial, and effectively mitigate semantic gaps in residual connections. Sufficient experimental results demonstrate the state-of-the-art performance of MMAIndoor on benchmark datasets including NYUv2, ScanNet, and InteriorNet.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.