Abstract

Gaze estimation is a prominent field within artificial intelligence and machine learning, rapidly developing due to the practical uses and possibilities. However, this development brings challenges, such as inaccuracy with different facial features and external factors such as lighting or camera quality. In the past, research has led to a cross encoder, or a swapping mechanism of the disentangled data from an image. The proposed method takes this one step further and incorporates multi-view images to leverage this disentanglement. Multi-view images allow for more data pairs within images to be swapped, resulting in more maximized and fine-tuned accuracy in gaze detection. Another added detail was transfer learning, or the carry over of a pre-optimized encoder to make the training process much more efficient. This can be incorporated into the real world, for example, by using it to control a computer mouse without physical movement or to detect patterns to diagnose neurodevelopmental disorders such as Attention Deficit Hyperactivity Disorder (ADHD) which can be difficult to detect in young children otherwise. The results of this newly proposed method produced more accurate results than state-of-the-art mechanisms, only having an angular error of 7.4 when trained and tested within the EVE dataset.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.