Eye tracking have garnered attention in human–machine interaction, disease monitoring, biometrics, etc. Existing investigations for eye tracking have predominantly concentrated on individual task for pupil detection or gaze estimation, overlooking the implicit relationships that exist among different tasks for eye tracking. In this work, we introduce a cascaded framework with transformer to collaboratively realize eye landmark detection, eye state detection and gaze estimation. Within our framework, we leverage Transformer to capture long dependencies with explicit eye-related structural information and implicit correlation among different tasks. Furthermore, the proposed cascade iteration framework alternatively optimize each task and boost the overall performance for pupil center, eye state and gaze estimation simultaneously. To address the problem of manual annotation, we further introduce the Control-Eye Diffusion Model (CEDM), a controllable eye image generation method conditioned on a simple contour with structure information. The proposed methods are evaluated on challenging datasets such as GI4E, BioID and MPIIGaze, and the results show that our methods outperform state-of-the-art methods in several tasks.
Read full abstract