Pedestrian trajectory prediction is an important area in computer vision, with wide applications in autonomous driving, robot path planning, and surveillance systems. The core underlying technique of these applications is pattern recognition. A key challenge in this area is modeling social interactions between pedestrians, such as pedestrian view area and group behaviors. However, although many methods have been proposed to model social interactions, pedestrian view area and group behaviors have not been explored together to account for complex situations. Additionally, most existing studies require additional detectors and manual annotations to handle view area and group interactions, respectively. In this paper, we propose a dual-branch spatio-temporal graph neural network to automatically model view area and grouping together. Specifically, a spatio-temporal graph attention network (STGAT) branch is designed to handle pedestrian view area, and a spatio-temporal graph convolutional network (STGCN) branch is designed to model group interactions. The features of these branches are then fused to provide better feature representations, on which a temporal convolution operation (TCN) is performed for trajectory prediction. Experiments on public standard datasets demonstrate that the proposed method achieves very competitive performance and predicts socially acceptable trajectories in different challenging scenarios.
Read full abstract