Abstract

We introduce a new problem of gaze anticipation on future frames which extends the conventional gaze prediction problem to go beyond current frames. To solve this problem, we propose a new generative adversarial network based model, Deep Future Gaze (DFG), encompassing two pathways: DFG-P is to anticipate gaze prior maps conditioned on the input frame which provides task influences; DFG-G is to learn to model both semantic and motion information in future frame generation. DFG-P and DFG-G are then fused to anticipate future gazes. DFG-G consists of two networks: a generator and a discriminator. The generator uses a two-stream spatial-temporal convolution architecture (3D-CNN) for explicitly untangling the foreground and background to generate future frames. It then attaches another 3D-CNN for gaze anticipation based on these synthetic frames. The discriminator plays against the generator by distinguishing the synthetic frames of the generator from the real frames. Experimental results on the publicly available egocentric and third person video datasets show that DFG significantly outperforms all competitive baselines. We also demonstrate that DFG achieves better performance of gaze prediction on current frames in egocentric and third person videos than state-of-the-art methods.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.