Abstract
Group activity recognition is a challenging task because there is an exponentially large number of semantic and geometrical relationships among individuals. This makes it difficult to model these interactions and merge them as a whole for group activity classification. In this paper, we propose a deep fully-connected model for group recognition, first we use the spatial-temporal model based on convolution neural network (CNN) and long short-term memory networks (LSTM) network to capture the dynamic features of each person. Then, we use the fully-connected conditional random field (FCCRF) to learn the interactions between people. Finally, with FCCRF potential functions we re-fine the activity recognition predicted by the spatial-temporal model. The experimental results on collective activity data-set and collective activity extended data-set show that our model improves recognition accuracy over baseline methods and gets competitive results in comparison to the state-of-the-art models.
Highlights
In recent years, vision-based human activity recognition has become a hot research direction
The main contributions of this paper are as follows: (1) Proposing a graphical framework based on deep learning network to simulate the interactions between people in group activity; (2) Using fully-connected Conditional Random Field model to correct the prediction errors generated by the deep learning network; The remaining contents are organized as below, in section 2 we review the related work of group activity recognition; in section 3 we describe the conditional random field model based on deep learning network; behavior classifications are analyzed in section 4; model training is briefly introduced in part 5; in section 6 we present the experimental results analysis and compare them to other models
After the video image is processed through the spatial-temporal model based on convolution neural network (CNN) and long short-term memory networks (LSTM) network, the output obtained contains the preliminary observation information and behavior category of each person in the image
Summary
Vision-based human activity recognition has become a hot research direction. Using CNN and LSTM network to obtain the dynamic information of the single-person behavior as well as the preliminary prediction of the group activity; using Conditional Random Field graphical model to describe the interactions between people in the group; integrating the knowledge from both CRF and deep learning network to refine the individual and group activity recognition. The main contributions of this paper are as follows: (1) Proposing a graphical framework based on deep learning network to simulate the interactions between people in group activity; (2) Using fully-connected Conditional Random Field model to correct the prediction errors generated by the deep learning network; The remaining contents are organized as below, in section 2 we review the related work of group activity recognition; in section 3 we describe the conditional random field model based on deep learning network; behavior classifications are analyzed in section 4; model training is briefly introduced in part 5; in section 6 we present the experimental results analysis and compare them to other models
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.