Abstract

Interaction recognition in videos with body pose is gaining remarkable attention due to its speed and robustness. Recently proposed recurrent neural network (RNN) and deep ConvNets-based methods are showing good performances in learning sequential information. Despite these good performances, RNN lags behind in learning spatial relation between body parts, while deep ConvNets requires huge amount of data for training. We propose a traversal-based three-layer neural network (TNN), followed by pairwise interaction framework (PIF) for interaction recognition. We also propose a novel algorithm for tracking humans in successive frames. The proposed algorithm computes collective traversal of individual body parts across the frames and feeds to TNN to learn effective representation of complex actions. The PIF model combines confidence scores of a pair of action labels corresponding to an interaction for final interaction prediction. We evaluate the approach on two publicly available datasets i.e. UT-Interaction and SBU Kinect Interaction. Results show that our proposed approach outperforms the state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call