SIF-TF: A Scene-Interaction fusion Transformer for trajectory prediction

Fei Gao,Wanjun Huang,Libo Weng,Yuanming Zhang

doi:10.1016/j.knosys.2024.111744

Abstract

Accurate pedestrian trajectory prediction is essential for the advancement of intelligent robot or autonomous vehicle, which is a challenging and interesting task. In this paper, a Scene-Interaction fusion Transformer (SIF-TF) for trajectory prediction is proposed, which takes into account three fundamental factors, i.e. social interaction, past trajectory, and semantic scene. A scene-social modeling method is added to the model to integrate social interaction and semantic scene. The proposed SIF-TF contains two critical components: the scene-social transformer and the temporal transformer. The scene-social transformer is tasked with capturing social interaction and semantic scene information, while the temporal transformer focuses on extracting temporal correlation information. Furthermore, the SIF-TF employs a two-stage trajectory prediction approach to jointly generate future trajectories. To evaluate the effectiveness, the comparative experiments were conducted on five widely-used public datasets. The experiments results, with an average evaluation metric of ADE/FDE of 0.23/0.47, significantly outperforms other state-of-the-art methods. These findings demonstrate that the proposed SIF-TF is capable of delivering more precise pedestrian trajectory predictions across diverse scene backgrounds.

Full Text