Abstract
6DoF camera localization is an important component of autonomous driving and navigation. Deep learning has achieved impressive results in localization, but its robustness in dynamic environments has not been adequately addressed. In this paper, we propose a framework based on hybrid attention mechanism which can be generally applied to existing CNN-based pose regressors to improve their robustness in dynamic environments. Specifically, we propose a novel Transformer Bottleneck (TBo) block including convolution, channel attention, and a position-aware self-attention mechanism, which extracts more geometrically robust features by capturing the corresponding long-term dependencies between pixels. Furthermore, we introduce shuffle attention (SA) before the pose regressor, which integrates feature information in both spatial and channel dimensions, forcing the network to learn geometrically robust features, reducing the effects of dynamic objects and illumination conditions to improve camera localization accuracy. We evaluate our method on commonly benchmarked indoor and outdoor datasets and the experimental results show that our proposed method can significantly improve localization performance compared compare favorably to contemporary pose regressors schemes. In addition, extensive ablation evaluations are conducted to prove the effectiveness of our proposed hybrid attention bottleneck block for pose regression networks.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.