Abstract

Gaze following is the task of detecting the point of attention of where a third person gaze is staring in a single image. Existing studies have made some modifications to architectures or have additionally learned the gaze angle, and have achieved notable performances. However, when a complex scene is given, the methods generally predict incorrect locations because of the lack of depth information in an RGB image. In this paper, we propose a novel three-stage deep neural networks algorithm to tackle such challenging scenes using a depth map. We achieve state-of-the-art performance on the GazeFollow dataset and examine possibilities for the research of depth information in image interpretation. Moreover, a qualitative comparison shows that our method works stably and accurately for complex scenes similar to those found in real-world photographs.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call