The Effect of Different Deep Network Architectures upon CNN-Based Gaze Tracking

Hui-Hui Chen,Jung-Shyr Wu,Po-Ting Liu,Bor-Jiunn Hwang

doi:10.3390/a13050127

Hui-Hui Chen, Jung-Shyr Wu + Show 2 more

Open Access

https://doi.org/10.3390/a13050127

Copy DOI

Journal: Algorithms	Publication Date: May 19, 2020
Citations: 5	License type: CC BY 4.0

Affiliation: Ming Chuan University, National Central University

Abstract

In this paper, we explore the effect of using different convolutional layers, batch normalization and the global average pooling layer upon a convolutional neural network (CNN) based gaze tracking system. A novel method is proposed to label the participant’s face images as gaze points retrieved from eye tracker while watching videos for building a training dataset that is closer to human visual behavior. The participants can swing their head freely; therefore, the most real and natural images can be obtained without too many restrictions. The labeled data are classified according to the coordinate of gaze and area of interest on the screen. Therefore, varied network architectures are applied to estimate and compare the effects including the number of convolutional layers, batch normalization (BN) and the global average pooling (GAP) layer instead of the fully connected layer. Three schemes, including the single eye image, double eyes image and facial image, with data augmentation are used to feed into neural network to train and evaluate the efficiency. The input image of the eye or face for an eye tracking system is mostly a small-sized image with relatively few features. The results show that BN and GAP are helpful in overcoming the problem to train models and in reducing the amount of network parameters. It is shown that the accuracy is significantly improved when using GAP and BN at the mean time. Overall, the face scheme has a highest accuracy of 0.883 when BN and GAP are used at the mean time. Additionally, comparing to the fully connected layer set to 512 cases, the number of parameters is reduced by less than 50% and the accuracy is improved by about 2%. A detection accuracy comparison of our model with the existing George and Routray methods shows that our proposed method achieves better prediction accuracy of more than 6%.

Highlights

Gaze tracking can help understand cognitive processes and emotional state, and has been applied in many fields, such as medicine, Human-Computer Interaction (HCI), and e-learning [1,2,3].The techniques of gaze tracking are classified into two methods, model-based and appearance-based [4].First, the model-based method mainly uses the near-infrared light device to track the pupil position and the designed algorithm to estimate the gaze points which usually require expensive hardware [5].A simple video-based eye tracking system was developed with one camera and one infrared light source to determine a person’s point of regard (PoR) [6], assuming the location of features in the eye video is known
number of convolutional layers (NoCL), so the be performed at C5 to C9, which we explore the effect of batch normalization (BN) on small size of training image and a small number of network layers
Theofevaluations parameters arefully performed by adjusting the numbers and to build a training dataset as the participant watches videos, because this is closer to the viewer’s settings of BN as well as the global average pooling (GAP) instead of the fully connected layer

Summary

Introduction

Gaze tracking can help understand cognitive processes and emotional state, and has been applied in many fields, such as medicine, Human-Computer Interaction (HCI), and e-learning [1,2,3].The techniques of gaze tracking are classified into two methods, model-based and appearance-based [4].First, the model-based method mainly uses the near-infrared light device to track the pupil position and the designed algorithm to estimate the gaze points which usually require expensive hardware [5].A simple video-based eye tracking system was developed with one camera and one infrared light source to determine a person’s point of regard (PoR) [6], assuming the location of features in the eye video is known. Gaze tracking can help understand cognitive processes and emotional state, and has been applied in many fields, such as medicine, Human-Computer Interaction (HCI), and e-learning [1,2,3]. The techniques of gaze tracking are classified into two methods, model-based and appearance-based [4]. The model-based method mainly uses the near-infrared light device to track the pupil position and the designed algorithm to estimate the gaze points which usually require expensive hardware [5]. A simple video-based eye tracking system was developed with one camera and one infrared light source to determine a person’s point of regard (PoR) [6], assuming the location of features in the eye video is known. Zhu et al [7] used the dynamic head compensation model to solve the effect of head movement for estimating the gaze movement. The gaze is calculated by Algorithms 2020, 13, 127; doi:10.3390/a13050127 www.mdpi.com/journal/algorithms

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

The Effect of Different Deep Network Architectures upon CNN-Based Gaze Tracking

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithms

Lead the way for us

Similar Papers

A Data-Driven-Based Fault Diagnosis Approach for Electrical Power DC-DC Inverter by Using Modified Convolutional Neural Network With Global Average Pooling and 2-D Feature Image
Wenfeng Gong ... Haibo Gao
IEEE Access | VOL. 8
Wenfeng Gong, et. al.Wenfeng Gong ... Haibo Gao
01 Jan 2020
IEEE Access | VOL. 8

An interpretable 1D convolutional neural network for detecting patient-ventilator asynchrony in mechanical ventilation
Qing Pan ... Luping Fang
Computer Methods and Programs in Biomedicine | VOL. 204
Qing Pan, et. al.Qing Pan ... Luping Fang
19 Mar 2021
Computer Methods and Programs in Biomedicine | VOL. 204

Transfer Learning for Humanoid Robot Appearance-Based Localization in a Visual Map
Emmanuel Ovalle-Magallanes ... Sergio Ledesma
IEEE Access | VOL. 9
Emmanuel Ovalle-Magallanes, et. al.Emmanuel Ovalle-Magallanes ... Sergio Ledesma
01 Jan 2020
IEEE Access | VOL. 9

Real-time defect detection network for polarizer based on deep learning
Ruizhen Liu ... Anhong Wang
Journal of Intelligent Manufacturing | VOL. 31
Ruizhen Liu, et. al.Ruizhen Liu ... Anhong Wang
19 Jan 2020
Journal of Intelligent Manufacturing | VOL. 31

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The Effect of Different Deep Network Architectures upon CNN-Based Gaze Tracking

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithms