Human gaze provides valuable information on human focus and intentions, making it a crucial area of research. Recently, deep learning has revolutionized appearance-based gaze estimation. However, due to the unique features of gaze estimation research, such as the unfair comparison between 2D gaze positions and 3D gaze vectors and the different pre-processing and post-processing methods, there is a lack of a definitive guideline for developing deep learning-based gaze estimation algorithms. In this paper, we present a systematic review of the appearance-based gaze estimation methods using deep learning. First, we survey the existing gaze estimation algorithms along the typical gaze estimation pipeline: deep feature extraction, deep learning model design, personal calibration and platforms. Second, to fairly compare the performance of different approaches, we summarize the data pre-processing and post-processing methods, including face/eye detection, data rectification, 2D/3D gaze conversion and gaze origin conversion. Finally, we set up a comprehensive benchmark for deep learning-based gaze estimation. We characterize all the public datasets and provide the source code of typical gaze estimation algorithms. This paper serves not only as a reference to develop deep learning-based gaze estimation methods, but also a guideline for future gaze estimation research.
Read full abstract