Abstract

Human trajectory prediction is a challenging task with important applications such as intelligent surveillance and autonomous driving. We recognize that pedestrians in close and distant neighborhoods have different impacts on the person’s decision of future movements. Local scene context and global scene layout also affect the movement decision differently. Existing methods have not adequately addressed these interactions between humans and the multi-level contexts occurring at different spatial and temporal scales. To this end, we propose a multi-level context-driven interaction modeling (MCDIM) method for human future trajectory learning and prediction. Specifically, we construct a multilayer graph attention network (GAT) to model the hierarchical human–human interactions. An extra set of long short-term memory networks is designed to capture the correlations of these human–human interactions at different temporal scales. To model the human–scene interactions, we explicitly extract and encode the global scene layout features and local context features in the neighborhood of the person at each time step and capture the spatial–temporal information of the interactions between human and the local scene contexts. The human–human and human–scene interactions are incorporated into the multi-level GAT-based network for accurate prediction of future trajectories. We have evaluated the method on benchmark datasets: the walking pedestrians dataset provided by ETH Zurich (ETH) and the crowd data provided by the University of Cyprus. The results demonstrate that our MCDIM method outperforms existing methods, being able to generate more accurate and plausible trajectories for pedestrians. The average performance gain is 2 and 3 percentage points in terms of the average displacement error and final displacement error, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call