Abstract

Appearance-based gaze estimation has been widely studied recently with promising performance. The majority of appearance-based gaze estimation methods are developed under the deterministic frameworks. However, the deterministic gaze estimation methods suffer from large performance drop upon challenging eye images in low-resolution, darkness, partial occlusions, etc. To alleviate this problem, in this article, we alternatively reformulate the appearance-based gaze estimation problem under a generative framework. Specifically, we propose a variational inference model, that is, variational gaze estimation network (VGE-Net), to generate multiple gaze maps as complimentary candidates simultaneously supervised by the ground-truth gaze map. To achieve robust estimation, we adaptively fuse the gaze directions predicted on these candidate gaze maps by a regression network through a simple attention mechanism. Experiments on three benchmarks, that is, MPIIGaze, EYEDIAP, and Columbia, demonstrate that our VGE-Net outperforms state-of-the-art gaze estimation methods, especially on challenging cases. Comprehensive ablation studies also validate the effectiveness of our contributions. The code will be publicly released.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call