Gated Recurrent Fusion to Learn Driving Behavior from Temporal Multimodal Data

Athma Narayanan,Avinash Siravuru,Behzad Dariush

doi:10.1109/lra.2020.2967738

Athma Narayanan, Avinash Siravuru + Show 1 more

Open Access

https://doi.org/10.1109/lra.2020.2967738

Copy DOI

Abstract

The Tactical Driver Behavior modeling problem requires an understanding of driver actions in complicated urban scenarios from rich multimodal signals including video, LiDAR and CAN signal data streams. However, the majority of deep learning research is focused either on learning the vehicle/environment state (sensor fusion) or the driver policy (from temporal data), but not both. Learning both tasks jointly offers the richest distillation of knowledge but presents challenges in the formulation and successful training. In this work, we propose promising first steps in this direction. Inspired by the gating mechanisms in Long Short-Term Memory units (LSTMs), we propose Gated Recurrent Fusion Units (GRFU) that learn fusion weighting and temporal weighting simultaneously. We demonstrate it's superior performance over multimodal and temporal baselines in supervised regression and classification tasks, all in the realm of autonomous navigation. On tactical driver behavior classification using Honda Driving Dataset (HDD), we report $10\%$ improvement in mean Average Precision (mAP) score, and similarly, for steering angle regression on TORCS dataset, we note a $20\%$ drop in Mean Squared Error (MSE) over the state-of-the-art.

Highlights

R ECENTLY, the domain of autonomous driving has emerged as one of the hotbeds for deep learning research, bolstered by strong industry support and availability of large real-world datasets and physically/visually realistic simulators ([5], [6], [7], [8])
Contributions of this work: We introduce a novel recurrent neural network unit, called the Gated Recurrent Fusion Unit (GRFU) that can jointly learn fusion and target prediction from temporal data
We identify two important ways to mitigate the above mentioned issues, a) delay fusion and pass each sensor parallely through M Long ShortTerm Memory units (LSTMs) cells, allowing each sensor to individually decide how much of their respective histories to utilize with the current sensor input, b) define gates for each sensor to determine the contribution of each sensor encoding to the fused cell and output states

Summary

Introduction

R ECENTLY, the domain of autonomous driving has emerged as one of the hotbeds for deep learning research, bolstered by strong industry support and availability of large real-world datasets (such as KITTI [1], Berkeley Driving Dataset [2], Honda Driving Dataset [3], Argoverse [4]) and physically/visually realistic simulators ([5], [6], [7], [8]). Multisensor temporal data, provided by these datasets, offers more leverage to understand optimal driving actions. Manuscript received September 10, 2019; accepted January 9, 2020. Date of publication January 20, 2020; date of current version January 31, 2020. This letter was recommended for publication by Associate Editor E.

Methods

Results

Discussion

Conclusion