First Person Vision for Activity Prediction Using Probabilistic Modeling

Shaheena Noor,Vali Uddin

doi:10.22581/muet1982.1804.09

Shaheena Noor, Vali Uddin

Open Access

https://doi.org/10.22581/muet1982.1804.09

Copy DOI

Abstract

Identifying activities of daily living is an important area of research with applications in smart-homes and healthcare for elderly people. It is challenging due to reasons like human self-occlusion, complex natural environment and the human behavior when performing a complicated task. From psychological studies, we know that human gaze is closely linked with the thought process and we tend to “look” at the objects before acting on them. Hence, we have used the object information present in gaze images as the context and formed the basis for activity prediction. Our system is based on HMM (Hidden Markov Models) and trained using ANN (Artificial Neural Network). We begin with extracting motion information from TPV (Third Person Vision) streams and object information from FPV (First Person Vision) cameras. The advantage of having FPV is that the object information forms the context of the scene. When context is included as input to the HMM for activity recognition, the precision increases. For testing, we used two standard datasets from TUM (Technische Universitaet Muenchen) and GTEA Gaze+ (Georgia Tech Egocentric Activities). In the first round, we trained our ANNs only with activity information and in the second round added the object information as well. We saw a significant increase in the precision (and accuracy) of predicted activities from 55.21% (respectively 85.25%) to 77.61% (respectively 93.5%). This confirmed our initial hypothesis that including the focus of attention of the actor in the form of object seen in FPV can help in predicting activities better.

Highlights

Identifying activities of daily living is an important area of research with applications in smart-homes and healthcare for elderly people
We begin with extracting motion information from TPV (Third Person Vision) streams and object information from FPV (First Person Vision) cameras
Despite the fact that a lot of research efforts have been directed to human activity recognition and significant results have been achieved, there are still many challenges w.r.t. to the human selfocclusion, complex natural environment and the human behavior when performing a complicated task [1]

Summary

Introduction

Identifying activities of daily living is an important area of research with applications in smart-homes and healthcare for elderly people. We saw a significant increase in the precision (and accuracy) of predicted activities from 55.21% (respectively 85.25%) to 77.61% (respectively 93.5%) This confirmed our initial hypothesis that including the focus of attention of the actor in the form of object seen in FPV can help in predicting activities better. Despite the fact that a lot of research efforts have been directed to human activity recognition and significant results have been achieved, there are still many challenges w.r.t. to the human selfocclusion, complex natural environment and the human behavior when performing a complicated task [1] We exploit the fact that gaze is strongly linked to human actions and thought process and provides a strong cue of what is going in the mind w.r.t. goal accomplishment

Methods

Results

Conclusion