We Know Where They Are Looking at From the RGB-D Camera: Gaze Following in 3D

Zhengxi Hu,Jingtai Liu,Shichao Wu,Shilei Cheng,Lei Zhou,Dingye Yang

doi:10.1109/tim.2022.3160534

Abstract

Inferring gaze target or gaze following is an effective way to understand human actions and intentions, which makes quite a challenge. Some existing studies on gaze estimation cannot accurately locate the gaze target in a 3D scene by gaze direction alone, while other studies on gaze following have failed to exploit the contexts in the 3D scene. In this article, we make full use of the information obtained by the RGB-D camera and innovatively expand the gaze target estimation from 2D image to 3D space through the predicted 3D gaze vector. Specifically, we rebuild a new 3D gaze-following dataset, RGB-D Attention dataset, which contains 3D real-world gaze behaviors. In addition, we extend the depth information for the GazeFollow dataset to utilize its diverse scene information in the training process of 3D gaze following. Then, considering the gaze direction as a crucial clue, we propose a novel gaze vector space containing 3D information and a 3D gaze pathway for learning the gaze behavior in the 3D scene. After two-stage training, the entire model can output the predicted 3D gaze vector and the predicted gaze heatmap, which are used to estimate the 3D gaze target in the inference algorithm. Experiments in the 3D scenes show that our method can reduce the predicted average distance error to 0.307 m and the predicted average angle error to 19.8°. Compared with the state-of-the-art gaze inference method, our proposed method has reduced the prediction error by more than 45%. Our web page is at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://sites.google.com/view/3dgazefollow</uri> .

Full Text