Abstract

An essential prerequisite for autonomous vehicles deploying in urban scenarios is the ability to accurately recognize the behavioral intentions of pedestrians and other vulnerable road users and take measures to ensure their safety. In this paper, a spatial-temporal feature fusion-based multi-attention network (STFF-MANet) is designed to predict pedestrian crossing intention. Pedestrian information, vehicle information, scene context, and optical flow are extracted from continuous image sequences as feature sources. A lightweight 3D convolutional network is designed to extract temporal features from optical flow. Construct a spatial encoding module to extract the spatial features from the context. Pedestrian motion information are re-encoded using a collection of gated recurrent units. The final network structure is created through ablation research, which introduces attention mechanisms into the network to merge pedestrian motion features and spatio-temporal features. The efficiency of the suggested strategy is demonstrated by comparison experiments on the datasets JAAD and PIE. On the JAAD dataset, the intent recognition accuracy is 9% more accurate than the existing techniques.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.