Abstract

In this article, we propose a dataset named Wheelchair-OmniGaze and a method of estimating the gaze point of a wheelchair user that can cope with large head motion as the first step for understanding the intention of a wheelchair user. In the proposed method, the wheelchair user's face is observed by two remote cameras, called face cameras, mounted on the two front sides of the wheelchair, while a spherical camera with a 360° field of view, called a spherical scene camera, is mounted to the top rear to observe the surrounding scenes. A user's gaze point at the environment is determined by combining the facial information acquired by the face cameras with the spherical image acquired by the spherical scene camera. Concretely, a convolutional neural network–long short-term memory network is used to learn gaze points of the spherical image from the two face image sequences. Using two face cameras enables a user's face to be observed even for a large head motion. Using a spherical scene camera mounted on a wheelchair, the representation of a user's gaze points at a wheelchair-centered spherical camera coordinate can be obtained, which is independent of a user's head motion. The experimental results show that our method achieved competitive performance in comparison with the state-of-the-art method. Since our problem definition is different from the existing research, the experimental results of this article can be seen as a baseline for our built dataset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call