Abstract

Human intention prediction is of great significance in many applications, such as human-robot interaction, intelligent rehabilitation robots. This paper studies the problem of short-term next-active-object prediction in egocentric images. The short-term next-active-object refers to the object that a human is going to interact with in the short-term future, which is an embodiment of human intention. Most current methods usually use object-centered cues, such as the deviation of object appearance change and the unique shape of the egocentric object trajectory, to predict the next-active-object. In this paper, inspired by the fact that human intention is also revealed by human-centered cues, we propose a deep neural network model that integrates the cues from visual attention and hand positions to predict the next-active-object. Firstly, the probability maps of visual attention and hand positions are constructed, and then the probability distribution of next-active-object is generated. We experimentally compare our method with several baseline methods using two datasets and confirm its effectiveness. In addition, ablation experiments are conducted, and crucial points concerning the next-active-object are discussed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call