Residual-Network-Based Supervised Gaze Prediction for First-Person Videos

Yujie Li,Benying Tan,Shuxue Ding,Atsunori Kanemura,Xiang Li

doi:10.1109/access.2019.2913791

Abstract

Gaze prediction is a significant problem in efficiently processing and understanding a large number of incoming visual signals from first-person views (i.e., egocentric vision). Because many visual processes are expensive and human beings do not process the whole visual field, thus knowing the gaze position is an efficient way to understand the salient content of a video and what users pay attention to. However, current methods for gaze prediction are bottom-up methods and cannot incorporate information about user actions. We proposed a supervised gaze prediction framework based on a residual network, which takes the gaze of user action into consideration. Our model uses the features extracted from the VGG-16 deep neural network to predict the gaze position in FPV videos. The deep residual networks are introduced to combine with this model for learning the residual maps. Our proposed method attempts to obtain gaze prediction results with high accuracy. According to the experimental results, the performance of our proposed gaze prediction method is competitive with that of the state-of-the-art approaches.

Full Text