Enhancing rotational accuracy in deep learning-based visual-inertial odometry via geometric-aware representations and state-space fusion
Enhancing rotational accuracy in deep learning-based visual-inertial odometry via geometric-aware representations and state-space fusion
- Research Article
2
- 10.1007/s44196-025-00942-0
- Aug 7, 2025
- International Journal of Computational Intelligence Systems
This study presents an innovative hybrid Visual-Inertial Odometry (VIO) method for Unmanned Aerial Vehicles (UAVs) that is resilient to environmental challenges and capable of dynamically assessing sensor reliability. Built upon a loosely coupled sensor fusion architecture, the system utilizes a novel hybrid Quaternion-focused Error-State EKF/UKF (Qf-ES-EKF/UKF) architecture to process inertial measurement unit (IMU) data. This architecture first propagates the entire state using an Error-State Extended Kalman Filter (ESKF) and then applies a targeted Scaled Unscented Kalman Filter (SUKF) step to refine only the orientation. This sequential process blends the accuracy of SUKF in quaternion estimation with the overall computational efficiency of ESKF. The reliability of visual measurements is assessed via a dynamic sensor confidence score based on metrics, such as image entropy, intensity variation, motion blur, and inference quality, adapting the measurement noise covariance to ensure stable pose estimation even under challenging conditions. Comprehensive experimental analyses on the EuRoC MAV dataset demonstrate key advantages: an average improvement of 49% in position accuracy in challenging scenarios, an average of 57% in rotation accuracy over ESKF-based methods, and SUKF-comparable accuracy achieved with approximately 48% lower computational cost than a full SUKF implementation. These findings demonstrate that the presented approach strikes an effective balance between computational efficiency and estimation accuracy, and significantly enhances UAV pose estimation performance in complex environments with varying sensor reliability.
- Conference Article
19
- 10.1109/iros45743.2020.9341592
- Oct 24, 2020
This paper presents a new visual-inertial odometry, term DUI-VIO, for estimating the motion state of an RGB-D camera. First, a Gaussian mixture model (GMM) to is employed to model the uncertainty of the depth data for each pixel on the camera’s color image. Second, the uncertainties are incorporated into the VIO’s initialization and optimization processes to make the state estimate more accurate. In order to perform the initialization process, we propose a hybrid-perspective-n-point (PnP) method to compute the pose change between two camera frames and use the result to triangulate the depth for an initial set of visual features whose depth values are unavailable from the camera. Hybrid-PnP first uses a 2D-2D PnP algorithm to compute rotation so that more visual features may be used to obtain a more accurate rotation estimate. It then uses a 3D-2D scheme to compute translation by taking into account the uncertainties of depth data, resulting in a more accurate translation estimate. The more accurate pose change estimated by Hybrid-PnP help to improve the initialization result and thus the VIO performance in state estimation. In addition, Hybrid-PnP make it possible to compute the pose change by using a small number of features with a known depth. This improves the reliability of the initialization process. Finally, DUI-VIO incorporates the uncertainties of the inverse depth measurements into the nonlinear optimization process, leading to a reduced state estimation error. Experimental results validate that the proposed DUI-VIO method outperforms the state-of-the-art VIO methods in terms of accuracy and reliability
- Research Article
9
- 10.1088/1361-6501/ad9627
- Dec 3, 2024
- Measurement Science and Technology
In low-light environments, the scarcity of visual information makes feature extraction and matching challenging for traditional visual simultaneous localization and mapping (SLAM) systems. Changes in ambient lighting can also reduce the accuracy and recall of loop closure detection. Most existing image enhancement methods tend to introduce noise, artifacts, and color distortions when enhancing images. To address these issues, we propose an innovative low-light visual-inertial (LL-VI) SLAM system, named LL-VI SLAM, which integrates an image enhancement network into the front end of the SLAM system. This system consists of a learning-based low-light enhancement network and an improved visual-inertial odometry. Our low-light enhancement network, composed of a Retinex-based enhancer and a U-Net-based denoiser, enhances image brightness while mitigating the adverse effects of noise and artifacts. Additionally, we incorporate a robust Inertial Measurement Unit initialization process at the front end of the system to accurately estimate gyroscope biases and improve rotational estimation accuracy. Experimental results demonstrate that LL-VI SLAM outperforms existing methods on three datasets, namely LOLv1, ETH3D, and TUM VI, as well as in real-world scenarios. Our approach achieves a peak signal-to-noise ratio of 22.08 dB. Moreover, on the TUM VI dataset, our system reduces localization error by 22.05% compared to ORB-SLAM3, proving the accuracy and robustness of the proposed method in low-light environments.