Visual relocalization is crucial for applications such as simultaneous localization and mapping (SLAM) and augmented reality. State-of-the-art methods based on multiple view geometry struggles with robust localization in challenging environments due to visual perturbations. Recently, deep learning-based absolute pose regression (APR) has gained significant attention owing to its robust feature extraction capabilities, yet it consistently falls short of matching the localization accuracy achieved by structure-based localization methods. In the paper, a novel visual localization framework is developed, which fuses geometric structure methods with neural network inference via the matrix Fisher distribution on the special orthogonal group SO(3). To be specific, a deep neural network is trained to generate the exponential probability density model on SO(3), i.e., the matrix Fisher distribution, which does not require the implementation of complex functions to constrain the rotational variables and can simultaneously model the uncertainty of the rotation estimates. A Bayesian fusion method is then introduced to fuse the orientation predictions recovered from the Fisher parameters with the rotation estimates computed by a structure-based approach. Moreover, the fusion results are concatenated with the feature encoding of the image to compensate for the correlation with the rotational variables, and the position prediction is output through a fully connected layer. The performance of the developed localization framework is evaluated via public indoor and outdoor datasets, demonstrating that the approach significantly improves localization accuracy while remaining robust in challenging environments.
Read full abstract