AbstractMoving humans, agents, and subjects bring many challenges to robot self‐localisation and environment perception. To adapt to dynamic environments, SLAM researchers typically apply several deep learning image segmentation models to eliminate these moving obstacles. However, these moving obstacle segmentation methods cost too much computation resource for the onboard processing of mobile robots. In the current industrial environment, mobile robot collaboration scenario, the noise of mobile robots could be easily found by on‐board audio‐sensing processors and the direction of sound sources can be effectively acquired by sound source estimation algorithms, but the distance estimation of sound sources is difficult. However, in the field of visual perception, the 3D structure information of the scene is relatively easy to obtain, but the recognition and segmentation of moving objects is more difficult. To address these problems, a novel vision‐audio fusion method that combines sound source localisation methods with a visual SLAM scheme is proposed, thereby eliminating the effect of dynamic obstacles on multi‐agent systems. Several heterogeneous robots experiments in different dynamic scenes indicate very stable self‐localisation and environment reconstruction performance of our method.
Read full abstract