Gaze Tracking Based on Concatenating Spatial-Temporal Features.

Bor-Jiunn Hwang,Chaur-Heh Hsieh,Deng-Yu Huang,Hui-Hui Chen

doi:10.3390/s22020545

Bor-Jiunn Hwang, Chaur-Heh Hsieh + Show 2 more

Open Access

https://doi.org/10.3390/s22020545

Copy DOI

Abstract

Based on experimental observations, there is a correlation between time and consecutive gaze positions in visual behaviors. Previous studies on gaze point estimation usually use images as the input for model trainings without taking into account the sequence relationship between image data. In addition to the spatial features, the temporal features are considered to improve the accuracy in this paper by using videos instead of images as the input data. To be able to capture spatial and temporal features at the same time, the convolutional neural network (CNN) and long short-term memory (LSTM) network are introduced to build a training model. In this way, CNN is used to extract the spatial features, and LSTM correlates temporal features. This paper presents a CNN Concatenating LSTM network (CCLN) that concatenates spatial and temporal features to improve the performance of gaze estimation in the case of time-series videos as the input training data. In addition, the proposed model can be optimized by exploring the numbers of LSTM layers, the influence of batch normalization (BN) and global average pooling layer (GAP) on CCLN. It is generally believed that larger amounts of training data will lead to better models. To provide data for training and prediction, we propose a method for constructing datasets of video for gaze point estimation. The issues are studied, including the effectiveness of different commonly used general models and the impact of transfer learning. Through exhaustive evaluation, it has been proved that the proposed method achieves a better prediction accuracy than the existing CNN-based methods. Finally, 93.1% of the best model and 92.6% of the general model MobileNet are obtained.

Highlights

Most people rely on vision in receiving information
We propose a simple, yet effective convolutional neural network (CNN) Concatenate long short-term memory (LSTM) network (CCLN), to explore the performance of gaze estimation in the case of time-series videos as the input training data
We evaluate the performance for Concatenating LSTM network (CCLN) Gaze Estimation Network

Summary

Introduction

Most people rely on vision in receiving information. Fixation is the main way to receive visual information. For CNN-based gaze point estimation, the image is selected as the training and predictive data with the extracted spatial features. Concatenating LSTM Network) to improve the performance of gaze estimation in the case of time-series videos as the input training data. There are a few research works on gaze estimation that use videos based on temporal features as the input training data. We propose a method for constructing a video dataset for gaze point estimation based on concatenating spatial and temporal futures. We propose a simple, yet effective CNN Concatenate LSTM network (CCLN), to explore the performance of gaze estimation in the case of time-series videos as the input training data. This research takes account of the studied effect of the batch normalization and global average pooling methods into the designed network on better prediction accuracy and reducing complexity.

Proposed Concatenating Spatial and Temporal Features Method

System

Constructing by watching watching movies movies

Synchronizing

The Proposed CCLN Gaze Estimation Network

Numerical Analysis

The Evaluation for LSTM Block

Dropout and BN

The Comparison of Various Models

Comparison of Various General Models

Comparison of Multi-Layers LSTM

Layers

Transfer Learning

Findings

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Sensors (Basel, Switzerland)	Publication Date: Jan 11, 2022
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Gaze Tracking Based on Concatenating Spatial-Temporal Features.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors (Basel, Switzerland)

Lead the way for us

Similar Papers

Planetary gearbox fault diagnosis using bidirectional-convolutional LSTM networks
Dazhong Wu ... Zhongxiao Peng
Mechanical Systems and Signal Processing | VOL. 162
Dazhong Wu, et. al.Dazhong Wu ... Zhongxiao Peng
18 May 2021
Mechanical Systems and Signal Processing | VOL. 162

Crowd Management System Based on Hybrid Combination of LSTM and CNN
G V Arun Krishna ... Ram Chandra Reddy
-
G V Arun Krishna, et. al.G V Arun Krishna ... Ram Chandra Reddy
29 Nov 2022
29 Nov 2022

TwinLSTM: Two-channel LSTM Network for Online Action Detection
Yunfei Han ... Shan Tan
-
Yunfei Han, et. al.Yunfei Han ... Shan Tan
21 Aug 2022
21 Aug 2022

LSTM Neural Network based Tensile Stress Prediction of Rubber Streching
Yue Fang ... Zhiwen Miao
-
Yue Fang, et. al.Yue Fang ... Zhiwen Miao
30 Oct 2020
30 Oct 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Gaze Tracking Based on Concatenating Spatial-Temporal Features.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors (Basel, Switzerland)