Abstract

Eye gaze understanding plays a crucial role in human social interactions. The gaze has emerged as a powerful tool for many applications including health assessment, disease diagnosis, human behavior, communication analysis. Eye gaze estimation and prediction has become a very hot topic in the field of computer vision during the last decades, with a surprising and continuously growing number of application fields. In the last decade deep neural networks have revolutionized the whole machine learning area, and gaze tracking. Appearance-based models use deep convolutional networks (CNNs) to directly estimate the direction of gaze in the camera's frame of reference. This paper focuses on analyzing and investigating different CNN architectures for gaze estimation and prediction. Two tasks have been developed in this work: the gaze estimation and the gaze prediction based on previously estimated gaze-points. In the first task, several CNNs were used for finding the most accurate gaze estimation. In the second task, we predict gaze locations basing on the previously estimated gaze vectors while leveraging spatio-temporal information encoded in a previously recorded eye-images sequences. We used a Long Short Term Memory (LSTM) and Transformers based on self-attention approach and use of positional encoding, in order to predict next gaze locations. Different architectures has been trained on the OPENEDS2020 public dataset which has been revisited in order to improve overall performance. We, also, propose a novel architecture composed by ResNext50 and two fully connected layer. We demonstrate that proposed architectures obtain results that in terms of angular error outperforms the state of art.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call