Abstract
In practical applications, how to achieve a perfect balance between high accuracy and computational efficiency can be the main challenge for simultaneous localization and mapping (SLAM). To solve this challenge, we propose SD-VIS, a novel fast and accurate semi-direct visual-inertial SLAM framework, which can estimate camera motion and structure of surrounding sparse scenes. In the initialization procedure, we align the pre-integrated IMU measurements and visual images and calibrate out the metric scale, initial velocity, gravity vector, and gyroscope bias by using multiple view geometry (MVG) theory based on the feature-based method. At the front-end, keyframes are tracked by feature-based method and used for back-end optimization and loop closure detection, while non-keyframes are utilized for fast-tracking by direct method. This strategy makes the system not only have the better real-time performance of direct method, but also have high accuracy and loop closing detection ability based on feature-based method. At the back-end, we propose a sliding window-based tightly-coupled optimization framework, which can get more accurate state estimation by minimizing the visual and IMU measurement errors. In order to limit the computational complexity, we adopt the marginalization strategy to fix the number of keyframes in the sliding window. Experimental evaluation on EuRoC dataset demonstrates the feasibility and superior real-time performance of SD-VIS. Compared with state-of-the-art SLAM systems, we can achieve a better balance between accuracy and speed.
Highlights
Simultaneous localization and mapping (SLAM) plays an important role in self-driving cars, virtual reality, unmanned aerial vehicles (UAV), augmented reality and artificial intelligence [1,2].This technology can provide reliable state estimation for UAV and self-driving cars in GPS-denied environments by relying on its sensors
Various types of sensors can be utilized in SLAM, such as stereo camera, lidar, inertial measurement units (IMU), and monocular camera
They have significant disadvantages when used individually: the metric scale of stereo camera can be obtained directly by using fixed baseline length, but it can only be estimated accurately in a limited depth range [3]; lidar has high precision in indoor, but it will encounter the reflection problem of glass surface in outdoor [4]; cheap IMUs are extremely susceptible to bias and noise [5]; monocular camera cannot estimate the absolute metric scale [6]
Summary
Simultaneous localization and mapping (SLAM) plays an important role in self-driving cars, virtual reality, unmanned aerial vehicles (UAV), augmented reality and artificial intelligence [1,2]. The direct method considers the entire image or some pixels with a large gradient and directly estimates the camera motion and scene structure by minimizing the photometric error [11,12,13,14]. In [25,26], different semi-direct approaches were proposed for stereo odometry Both methods use feature-based tracking to obtain a motion prior, and perform direct semi-dense or sparse alignment to refine the camera pose. KLT sparse optical flow algorithm, which can further reduce the end, we only need to extract new feature points on the keyframes and track them with KLT sparse calculation while ensuring accuracy. SD-VIS are aretracked trackedby byfeature-based feature-based method, which is used for sliding window non-linear optimization and loop closure detection.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.