In the Eye of the Beholder: Gaze and Actions in First Person Video.

Yin Li,Miao Liu,James M Rehg

doi:10.1109/tpami.2021.3051319

Abstract

We address the task of jointly determining what a person is doing and where they are looking based on the analysis of video captured by a headworn camera. To facilitate our research, we first introduce the EGTEA Gaze+ dataset. Our dataset comes with videos, gaze tracking data, hand masks and action annotations, thereby providing the most comprehensive benchmark for First Person Vision (FPV). Moving beyond the dataset, we propose a novel deep model for joint gaze estimation and action recognition in FPV. Our method describes the participant's gaze as a probabilistic variable and models its distribution using stochastic units in a deep network. We further sample from these stochastic units, generating an attention map to guide the aggregation of visual features for action recognition. Our method is evaluated on our EGTEA Gaze+ dataset and achieves a performance level that exceeds the state-of-the-art by a significant margin. More importantly, we demonstrate that our model can be applied to larger scale FPV dataset-EPIC-Kitchens even without using gaze, offering new state-of-the-art results on FPV action recognition.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

In the Eye of the Beholder: Gaze and Actions in First Person Video.

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on pattern analysis and machine intelligence

Lead the way for us

Journal: IEEE transactions on pattern analysis and machine intelligence	Publication Date: Jan 15, 2021
Citations: 47

Similar Papers

In the Eye of Beholder: Joint Learning of Gaze and Actions in First Person Video
Yin Li ... James M Rehg
-
Yin Li, et. al.Yin Li ... James M Rehg
01 Jan 2018
01 Jan 2018

Ontology evolution for personalised and adaptive activity recognition
Muhammad Safyan ... Sohail Sarwar
IET Wireless Sensor Systems | VOL. 9
Muhammad Safyan, et. al.Muhammad Safyan ... Sohail Sarwar
01 Aug 2019
IET Wireless Sensor Systems | VOL. 9

AMIR
Shinan Liu ... John Paparrizos
Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies | VOL. 7
Shinan Liu, et. al.Shinan Liu ... John Paparrizos
27 Mar 2023
Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies | VOL. 7

ADR-SPLDA: Activity discovery and recognition by combining sequential patterns and latent Dirichlet allocation
Belkacem Chikhaoui ... Hélène Pigot
Pervasive and Mobile Computing | VOL. 8
Belkacem Chikhaoui, et. al.Belkacem Chikhaoui ... Hélène Pigot
10 Aug 2012
Pervasive and Mobile Computing | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

In the Eye of the Beholder: Gaze and Actions in First Person Video.

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on pattern analysis and machine intelligence