Abstract

Predicting gaze point on mobile devices without calibration in unconstrained environments has great significance on human computer interaction. Appearance-based gaze estimation methods have been improved due to the recent advance in convolutional neural network (CNN) models and the availability of large-scale datasets. CNN models have limitations on extracting the global information of features and ignore the important information of local features. In this paper, we propose a novel structure named GazeAttentionNet. To improve the accuracy of gaze estimation, we use the global and local attention modules to utilize both global and local features. Firstly, we use MobileNetV2 and the self-attention layers as the global attention module to extract global features. Secondly, we add the local attention module containing the spatial attention to extract local features. With GazeAttentionNet, we achieve an excellent result on the GazeCapture dataset. The average errors of mobile phones and tablets are 1.67 cm and 2.37 cm.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call