Multi-attention network for pedestrian intention prediction based on spatio-temporal feature fusion

Xiaofei Zhang,Dan Wei,Yansong Wang,Xiaolan Wang,Xintian Liu,Weiwei Zhang

doi:10.1177/09544070231190522

Abstract

An essential prerequisite for autonomous vehicles deploying in urban scenarios is the ability to accurately recognize the behavioral intentions of pedestrians and other vulnerable road users and take measures to ensure their safety. In this paper, a spatial-temporal feature fusion-based multi-attention network (STFF-MANet) is designed to predict pedestrian crossing intention. Pedestrian information, vehicle information, scene context, and optical flow are extracted from continuous image sequences as feature sources. A lightweight 3D convolutional network is designed to extract temporal features from optical flow. Construct a spatial encoding module to extract the spatial features from the context. Pedestrian motion information are re-encoded using a collection of gated recurrent units. The final network structure is created through ablation research, which introduces attention mechanisms into the network to merge pedestrian motion features and spatio-temporal features. The efficiency of the suggested strategy is demonstrated by comparison experiments on the datasets JAAD and PIE. On the JAAD dataset, the intent recognition accuracy is 9% more accurate than the existing techniques.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multi-attention network for pedestrian intention prediction based on spatio-temporal feature fusion

Abstract

Talk to us

Similar Papers

More From: Proceedings of the Institution of Mechanical Engineers, Part D: Journal of Automobile Engineering

Lead the way for us

Similar Papers

Graph-Temporal LSTM Networks for Skeleton-Based Action Recognition
Hongsheng Li ... Peiyi Shen
-
Hongsheng Li, et. al.Hongsheng Li ... Peiyi Shen
01 Jan 2020
01 Jan 2020

Human activity recognition using weighted multi-camera features

-

02 Mar 2017
02 Mar 2017

Human-inspired spatiotemporal feature extraction and fusion network for weather forecasting
Han Wu ... Junyi Zuo
Expert Systems with Applications | VOL. 207
Han Wu, et. al.Han Wu ... Junyi Zuo
08 Jul 2022
Expert Systems with Applications | VOL. 207

VRUNet: Multi-Task Learning Model for Intent Prediction of Vulnerable Road Users
Adithya Ranga ... Xavier Perotton
Electronic Imaging | VOL. 32
Adithya Ranga, et. al.Adithya Ranga ... Xavier Perotton
26 Jan 2020
Electronic Imaging | VOL. 32

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multi-attention network for pedestrian intention prediction based on spatio-temporal feature fusion

Abstract

Talk to us

Similar Papers

More From: Proceedings of the Institution of Mechanical Engineers, Part D: Journal of Automobile Engineering