Abstract

Human trajectory prediction is a crucial yet challenging problem, which is of fundamental importance to robotics and autonomous driving vehicles. The core challenge lies in effectively modeling the socially aware spatial interaction and complex temporal dependencies among crowds. However, previous methods usually model spatial and temporal information separately and only use individual features, while ignoring group-level motion characteristics. We propose a novel trajectory prediction framework termed GA-STT, a group aware spatial-temporal transformer network to address these issues. Specifically, we first get the individual representations supervised by group-based annotations. Then, we model the complex spatial-temporal interactions with spatial and temporal transformers separately and fuse spatial-temporal embedding through the cross-attention mechanism. Results on the publicly available ETH/UCY datasets show that our model outperforms the state-of-the-art method by 19.4% in average displacement error(ADE) and 16.9% in final displacement error(FDE) and successfully predicts complex spatial-temporal interactions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call