Abstract

In smart interactive environments, such as digital museums or digital exhibition halls, it is important to accurately understand the user’s intent to ensure successful and natural interaction with the exhibition. In the context of predicting user intent, gaze estimation technology has been considered one of the most effective indicators among recently developed interaction techniques (e.g., face orientation estimation, body tracking, and gesture recognition). Previous gaze estimation techniques, however, are known to be effective only in a controlled lab environment under normal lighting conditions. In this study, we propose a novel deep learning-based approach to achieve a successful gaze estimation under various low-light conditions, which is anticipated to be more practical for smart interaction scenarios. The proposed approach utilizes a generative adversarial network (GAN) to enhance users’ eye images captured under low-light conditions, thereby restoring missing information for gaze estimation. Afterward, the GAN-recovered images are fed into the convolutional neural network architecture as input data to estimate the direction of the user gaze. Our experimental results on the modified MPIIGaze dataset demonstrate that the proposed approach achieves an average performance improvement of 4.53%–8.9% under low and dark light conditions, which is a promising step toward further research.

Highlights

  • Human–computer interaction technologies are becoming increasingly vital for advanced smart interactive systems

  • Could the proposed approach improve the performance of gaze estimation under low illumination conditions? Through this question, we validate the feasibility of the proposed approach which adopts generative adversarial network (GAN)-based image enhancement in the loop

  • We proposed an approach based on a generative adversarial network to improve the gaze estimation performance under low and dark light conditions

Read more

Summary

Introduction

Human–computer interaction technologies are becoming increasingly vital for advanced smart interactive systems. Recent interactive systems attempt to detect user intents expressed in the form of gestures or voice commands using signals from various sensing devices [1,2,3]. More immersive and natural ways to capture user intent include face orientation estimation [4,5,6], body tracking [7,8,9], and estimation of gaze direction [10,11,12,13,14,15,16,17,18,19,20,21]. Attempts have been made to detect a human activity by recognizing the joints of a human body with several Kinects for an interactive virtual training environment [8]

Methods
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call