Abstract The assumptions of a static environment and scene rigidity are important theoretical underpinnings of traditional visual simultaneous localization and mapping (SLAM) algorithms. However, these assumptions are difficult to work in dynamic environments containing non-rigid objects, and cannot effectively handle the characteristics of local areas of non-rigid moving objects, seriously affecting the robustness and accuracy of the SLAM system in localization and mapping. To address these problems, we improved ORB-SLAM3 and proposed a real-time RGB-D visual SLAM framework for dynamic environments based on StrongSORT—Strong-SLAM. First, we combine YOLOv7-tiny with StrongSORT to match the semantic information of dynamic targets. Optical flow and epipolar constraints are then used to initially extract geometric and motion information between adjacent frames. Subsequently, based on an improved adaptive threshold segmentation algorithm and geometric residuals, a background model and a Gaussian residual model are constructed to further extract the geometric information of dynamic targets. Finally, semantic and geometric information are integrated to perform global feature motion level classification, and motion probabilities and optimization weights are defined to participate in global pose estimation and optimization. Experimental results on the publicly available TUM RGB-D dataset show that Strong-SLAM reduces the absolute trajectory error and relative pose error by at least 90% compared to ORB-SLAM3, achieving performance comparable to the most advanced dynamic SLAM solutions.
Read full abstract