CTA-Net: A gaze estimation network based on dual feature aggregation and attention cross fusion

Chenxing Xia,Yan Zhang,Bin Ge,Wei Wang,Wenjun Zhao,Xiuju Gao,Kuan-Ching Li,Zhanpeng Tao

doi:10.2298/csis231116020x

Abstract

Recent work has demonstrated the Transformer model is effective for computer vision tasks. However, the global self-attention mechanism utilized in Transformer models does not adequately consider the local structure and details of images, which may result in the loss of information and local details, causing decreased estimation accuracy in gaze estimation tasks when compared to convolution or sequential stacking methods. To address this issue, we propose a parallel CNNs-Transformer aggregation network (CTA-Net) for gaze estimation, which fully leverages the advantages of the Transformer model in modeling global context while the convolutional neural networks (CNNs) model in retaining local details. Specifically, Transformer and ResNet are deployed to extract facial and eye information, respectively. Additionally, an attention cross fusion (ACFusion) Block is embedded with CNN branch, which decomposes features in space and channels to supplement lost features, suppress noise, and help extract eye features more effectively. Finally, a dual-feature aggregation (DFA) module is proposed to effectively fuse the output features of both branches with the help feature a selection mechanism and a residual structure. Experimental results on the MPIIGaze and Gaze360 datasets demonstrate that our CTA-Net achieves state-of-the-art results.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

CTA-Net: A gaze estimation network based on dual feature aggregation and attention cross fusion

Abstract

Talk to us

Similar Papers

More From: Computer Science and Information Systems

Lead the way for us

Journal: Computer Science and Information Systems	Publication Date: Jan 1, 2024
License type: CC BY-NC-ND 4.0

Similar Papers

Eye control system based on convolutional neural network: a review
Jianbin Xiong ... Jiehao Li
Assembly Automation | VOL. 42
Jianbin Xiong, et. al.Jianbin Xiong ... Jiehao Li
29 Aug 2022
Assembly Automation | VOL. 42

Application of deep learning for semantic segmentation in robotic prostatectomy: Comparison of convolutional neural networks and visual transformers.
Sahyun Pak ... Wonchul Lee
Investigative and clinical urology | VOL. 65
Sahyun Pak, et. al.Sahyun Pak ... Wonchul Lee
01 Jan 2024
Investigative and clinical urology | VOL. 65

Gaze Estimation Based on Convolutional Structure and Sliding Window-Based Attention Mechanism.
Yujie Li ... Jiahui Chen
Sensors (Basel, Switzerland) | VOL. 23
Yujie Li, et. al.Yujie Li ... Jiahui Chen
07 Jul 2023
Sensors (Basel, Switzerland) | VOL. 23

Improving Vision Transformers by Revisiting High-Frequency Components
Jiawang Bai ... Shuicheng Yan
-
Jiawang Bai, et. al.Jiawang Bai ... Shuicheng Yan
01 Jan 2021
01 Jan 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

CTA-Net: A gaze estimation network based on dual feature aggregation and attention cross fusion

Abstract

Talk to us

Similar Papers

More From: Computer Science and Information Systems