In the Eye of Transformer: Global–Local Correlation for Egocentric Gaze Estimation and Beyond

Bolin Lai,Miao Liu,Fiona Ryan,James M Rehg

doi:10.1007/s11263-023-01879-7

Abstract

Predicting human’s gaze from egocentric videos serves as a critical role for human intention understanding in daily activities. In this paper, we present the first transformer-based model to address the challenging problem of egocentric gaze estimation. We observe that the connection between the global scene context and local visual information is vital for localizing the gaze fixation from egocentric video frames. To this end, we design the transformer encoder to embed the global context as one additional visual token and further propose a novel global–local correlation module to explicitly model the correlation of the global token and each local token. We validate our model on two egocentric video datasets – EGTEA Gaze + and Ego4D. Our detailed ablation studies demonstrate the benefits of our method. In addition, our approach exceeds the previous state-of-the-art model by a large margin. We also apply our model to a novel gaze saccade/fixation prediction task and the traditional action recognition problem. The consistent gains suggest the strong generalization capability of our model. We also provide additional visualizations to support our claim that global–local correlation serves a key representation for predicting gaze fixation from egocentric videos. More details can be found in our website (https://bolinlai.github.io/GLC-EgoGazeEst).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Computer Vision	Publication Date: Oct 18, 2023
Citations: 8	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

In the Eye of Transformer: Global–Local Correlation for Egocentric Gaze Estimation and Beyond

Abstract

Talk to us

Similar Papers

More From: International Journal of Computer Vision

Lead the way for us

Similar Papers

Activity Recognition in Egocentric Life-Logging Videos
Sibo Song ... Joo-Hwee Lim
-
Sibo Song, et. al.Sibo Song ... Joo-Hwee Lim
01 Jan 2015
01 Jan 2015

Egocentric Video Search via Physical Interactions
Taiki Miyanishi ... Quan Kong
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 30
Taiki Miyanishi, et. al.Taiki Miyanishi ... Quan Kong
21 Feb 2016
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 30

EvIs-Kitchen: Egocentric Human Activities Recognition with Video and Inertial Sensor Data
Yuzhe Hao ... Kuniaki Uto
-
Yuzhe Hao, et. al.Yuzhe Hao ... Kuniaki Uto
01 Jan 2023
01 Jan 2023

Learning to Recognize Actions on Objects in Egocentric Video With Attention Dictionaries.
Swathikiran Sudhakaran ... Oswald Lanz
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 45
Swathikiran Sudhakaran, et. al.Swathikiran Sudhakaran ... Oswald Lanz
11 Feb 2021
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 45

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

In the Eye of Transformer: Global–Local Correlation for Egocentric Gaze Estimation and Beyond

Abstract

Talk to us

Similar Papers

More From: International Journal of Computer Vision