Abstract

Recognizing human-object interactions in videos is a very challenging problem in computer vision research. There are two major difficulties lying in this task: (1) The detection of human body parts and objects is usually affected by the quality of the videos, for instance, low resolutions of the videos, camera motions, and blurring frames caused by fast motions, as well as the self-occlusions during human-object interactions. (2) The spatial and temporal dynamics of human-object interaction are hard to model. In order to overcome those natural obstacles, we propose a new method using social network analysis (SNA) based features to describe the distributions and relationships of low level objects for human-object interaction recognition. In this approach, the detected human body parts and objects are treated as nodes in social network graphs, and a set of SNA features including closeness, centrality and centrality with relative velocity are extracted for action recognition. A major advantage of SNA based feature set is its robustness to varying node numbers and erroneous node detections, which are very common in human-object interactions. An SNA feature vector will be extracted for each frame and different human-object interactions are classified based on these features. Two classification methods, including Support Vector Machine (SVM) and Hidden Markov Model (HMM), have been used to evaluate the proposed feature set on four different human-object interactions from HMDB dataset [1]. The experimental results demonstrated that the proposed framework can effectively capture the dynamical characteristics of human-object interaction and outperforms the state of art methods in human-object interaction recognition.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call