Improving Driver Gaze Prediction With Reinforced Attention

Kai Lv,Wei Li,Zhang Xiong,Hao Sheng,Liang Zheng

doi:10.1109/tmm.2020.3038311

Abstract

We consider the task of driver gaze prediction: estimating where the location of the focus of a driver should be, based on a raw video of the outside environment. In practice, we output a probability map that gives the normalized probability of each point in a given scene being the object of the driver attention. Most existing methods ( <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">i.e.</i> , <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Coarse-to-Fine</i> and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Multi-branch</i> ) take an image or a video as input and directly output the fixation map. While successful, these methods can often produce highly scattered predictions, rendering them unreliable for real-world usage. Motivated by this observation, we propose the reinforced attention (RA) model as a regulatory mechanism to increase prediction density. Our method is built directly on top of existing methods, making it complementary to current approaches. Specifically, we first use <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Multi-branch</i> to obtain an initial fixation map. Then, RA is trained using deep reinforcement learning to learn a location prediction policy, producing a reinforced attention. Finally, in order to obtain the final gaze prediction result, we combine the fixation map and the reinforced attention by a mask-guided multiplication. Experimental results show that our framework improves the accuracy of gaze prediction, and provides state-of-the-art performance on the DR(eye)VE dataset.

Highlights

Autonomous and assisted driving are some of the most active research areas in computer vision
Based on existing gaze prediction approaches, we introduce reinforced attention (RA) into the framework to estimate the attention
To evaluate the proposed method, we compare our approach with the state-of-the-art methods primarily in two aspects, i.e., the accuracy of gaze prediction and the accuracy of reinforced attention

Summary

Introduction

Autonomous and assisted driving are some of the most active research areas in computer vision. These works have focused on lane change assistance [1], traffic signs recognition [2], and many more [3]. The goal of gaze prediction is to provide useful suggestions to the driver where they should focus. In this task, the gaze points are gathered from real driving scene, and are defined as the ground truth of the training dataset. Gaze is defined as a probability map where each point in a given scene has a value This value denotes how much probability this point is the gaze of the driver

Objectives

Methods

Results

Conclusion