Abstract
In this paper, we explore the effect of using different convolutional layers, batch normalization and the global average pooling layer upon a convolutional neural network (CNN) based gaze tracking system. A novel method is proposed to label the participant’s face images as gaze points retrieved from eye tracker while watching videos for building a training dataset that is closer to human visual behavior. The participants can swing their head freely; therefore, the most real and natural images can be obtained without too many restrictions. The labeled data are classified according to the coordinate of gaze and area of interest on the screen. Therefore, varied network architectures are applied to estimate and compare the effects including the number of convolutional layers, batch normalization (BN) and the global average pooling (GAP) layer instead of the fully connected layer. Three schemes, including the single eye image, double eyes image and facial image, with data augmentation are used to feed into neural network to train and evaluate the efficiency. The input image of the eye or face for an eye tracking system is mostly a small-sized image with relatively few features. The results show that BN and GAP are helpful in overcoming the problem to train models and in reducing the amount of network parameters. It is shown that the accuracy is significantly improved when using GAP and BN at the mean time. Overall, the face scheme has a highest accuracy of 0.883 when BN and GAP are used at the mean time. Additionally, comparing to the fully connected layer set to 512 cases, the number of parameters is reduced by less than 50% and the accuracy is improved by about 2%. A detection accuracy comparison of our model with the existing George and Routray methods shows that our proposed method achieves better prediction accuracy of more than 6%.
Highlights
Gaze tracking can help understand cognitive processes and emotional state, and has been applied in many fields, such as medicine, Human-Computer Interaction (HCI), and e-learning [1,2,3].The techniques of gaze tracking are classified into two methods, model-based and appearance-based [4].First, the model-based method mainly uses the near-infrared light device to track the pupil position and the designed algorithm to estimate the gaze points which usually require expensive hardware [5].A simple video-based eye tracking system was developed with one camera and one infrared light source to determine a person’s point of regard (PoR) [6], assuming the location of features in the eye video is known
number of convolutional layers (NoCL), so the be performed at C5 to C9, which we explore the effect of batch normalization (BN) on small size of training image and a small number of network layers
Theofevaluations parameters arefully performed by adjusting the numbers and to build a training dataset as the participant watches videos, because this is closer to the viewer’s settings of BN as well as the global average pooling (GAP) instead of the fully connected layer
Summary
Gaze tracking can help understand cognitive processes and emotional state, and has been applied in many fields, such as medicine, Human-Computer Interaction (HCI), and e-learning [1,2,3].The techniques of gaze tracking are classified into two methods, model-based and appearance-based [4].First, the model-based method mainly uses the near-infrared light device to track the pupil position and the designed algorithm to estimate the gaze points which usually require expensive hardware [5].A simple video-based eye tracking system was developed with one camera and one infrared light source to determine a person’s point of regard (PoR) [6], assuming the location of features in the eye video is known. Gaze tracking can help understand cognitive processes and emotional state, and has been applied in many fields, such as medicine, Human-Computer Interaction (HCI), and e-learning [1,2,3]. The techniques of gaze tracking are classified into two methods, model-based and appearance-based [4]. The model-based method mainly uses the near-infrared light device to track the pupil position and the designed algorithm to estimate the gaze points which usually require expensive hardware [5]. A simple video-based eye tracking system was developed with one camera and one infrared light source to determine a person’s point of regard (PoR) [6], assuming the location of features in the eye video is known. Zhu et al [7] used the dynamic head compensation model to solve the effect of head movement for estimating the gaze movement. The gaze is calculated by Algorithms 2020, 13, 127; doi:10.3390/a13050127 www.mdpi.com/journal/algorithms
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.