Abstract

Attention mechanism has been found effective for human gaze estimation, and the attention and diversity of learned features are two important aspects of attention mechanism. However, the traditional attention mechanism used in existing gaze model is more prone to utilize first-order information that is attentive but not diverse. Though the existing bilinear pooling-based attention could overcome the shortcoming of traditional attention, it is limited to extract high-order contextual information. Thus we introduce a novel bilinear pooling-based attention mechanism, which could extract the second-order contextual information by the interaction between local deep learned features. To make the gaze-related features robust for spatial misalignment, we further propose an attention-in-attention method, which consists of a global average pooling and an inner attention on the second-order features. For the purpose of gaze estimation, a new bilinear pooling-based attention networks with attention-in-attention is further proposed. Extensive evaluation shows that our method surpasses the state-of-the-art by a big margin.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.