Abstract
Anticipating pedestrian crossing behavior in urban scenarios is a challenging task for autonomous vehicles. Early this year, a benchmark comprising JAAD and PIE datasets have been released. In the benchmark, several state-of-the-art methods have been ranked. However, most of the ranked temporal models rely on recurrent architectures. In our case, we propose, as far as we are concerned, the first self-attention alternative, based on transformer architecture, which has had enormous success in natural language processing (NLP) and recently in computer vision. Our architecture is composed of various branches which fuse video and kinematic data. The video branch is based on two possible architectures: RubiksNet and TimeSformer. The kinematic branch is based on different configurations of transformer encoder. Several experiments have been performed mainly focusing on pre-processing input data, highlighting problems with two kinematic data sources: pose keypoints and ego-vehicle speed. Our proposed model results are comparable to PCPA, the best performing model in the benchmark reaching an F1 Score of nearly against . Furthermore, by using only bounding box coordinates and image data, our model surpasses PCPA by a larger margin ( vs. ). Our model has proven to be a valid alternative to recurrent architectures, providing advantages such as parallelization and whole sequence processing, learning relationships between samples not possible with recurrent architectures.
Highlights
Road safety is one of the main concerns in the world, being the eighth leading cause of death and the first among young people between 5 and 29 years old
Accuracy does not represent a good performance estimator in imbalanced problems, so we focused our analysis of the results on F1 score (F1) and the area under the Receiver Operating Characteristic (ROC)
A similar case is shown in the bottom-left, showing that the network can generalize to different pedestrians and scenes and is able to focus on motion information
Summary
Road safety is one of the main concerns in the world, being the eighth leading cause of death and the first among young people between 5 and 29 years old. Traffic accidents caused approximately 1.35 million deaths and between 20 and 50 million of non-fatal injuries worldwide in 2016. The social and psychological problems arising from road accidents are followed by a considerable impact on the economy, costing 3% of gross domestic product in most countries [1]. Vulnerable road users (VRUs) represent more than half of all these deaths. Vulnerable Road User (VRU) group is the most affected one in urban roads in European. With 40% of the total VRUs deaths, as stated by European Transport Safety Council (ETSC) [2]. Thanks to various European Union (EU) initiatives and actions, the number of road fatalities has been decreasing since 2011 following a promising and continuous trend
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.