Aiming at the problem that dynamic objects (such as pedestrians, vehicles, and animals) in real scenes affect the positioning and mapping accuracy of visual SLAM (Simultaneous Localization and Mapping), a YOLOv3-ORB-SLAM3 algorithm is proposed based on ORB-SLAM3 (Oriented FAST and Rotated BRIEF-Simultaneous Localization and Mapping 3). The algorithm adds a semantic thread on the basis of ORB-SLAM3, and adopts a dual-thread mechanism for dynamic and static scene feature extraction: the semantic thread uses YOLOv3 to perform semantic recognition target detection on dynamic objects in the scene, and removes outliers from the extracted dynamic area feature points; the tracking thread extracts scene area features through ORB features, and obtains static scene features combined with semantic information and sends them to the backend, thereby eliminating the interference of dynamic scenes on the system and improving the positioning accuracy of the visual SLAM algorithm. Verified using the TUM (Technical University of Munich) dataset, the results show that the ATE (Average Treatment Effect) index of the YOLOv3-ORB-SLAM3 algorithm in the monocular mode is about 30% lower than that of the ORB-SLAM3 algorithm, and the ATE index of the dynamic sequence in the RGB-D (Red, Green and Blue-Depth) mode is reduced 10%, while the static sequence has no significant decrease.
Read full abstract