Abstract

Simultaneous localization and mapping (SLAM) is one of the cornerstones of autonomous navigation systems in robotics and the automotive industry. Visual SLAM (V-SLAM), which relies on image features, such as keypoints and descriptors to estimate the pose transformation between consecutive frames, is a highly efficient and effective approach for gathering environmental information. With the rise of representation learning, feature detectors based on deep neural networks (DNNs) have emerged as an alternative to handcrafted solutions. This work examines the integration of sparse learned features into a state-of-the-art SLAM framework and benchmarks handcrafted and learning-based approaches by comparing the two methods through in-depth experiments. Specifically, we replace the ORB detector and BRIEF descriptor of the ORBSLAM3 pipeline with those provided by Superpoint, a DNN model that jointly computes keypoints and descriptors. Experiments on three publicly available datasets from different application domains were conducted to evaluate the pose estimation performance and resource usage of both solutions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call