Abstract
Continuous action recognition in video is more challenging compared with traditional isolated action recognition. In this paper, we proposed a hybrid framework combining Convolutional Neural Network (CNN) and Latent-Dynamic Conditional Random Field (LDCRF) to segment and recognize continuous actions simultaneously. Most existing action recognition works construct complex handcrafted features, which are highly problem dependent. We utilize CNN model, a type of deep models, to automatically learn high level action features directly from raw inputs. The LDCRF model is used to model the intrinsic and extrinsic dynamics of actions. The CNN is embedded in the bottom layer of LDCRF, which converts the structure of LDCRF from shallow to deep. This framework incorporates action feature learning and continuous action recognition procedures in a unified way. The training of our model is in end-to-end fashion. The parameters of CNN and LDCRF are jointly optimized by gradient descent algorithm. We test our method on two public dataset: KTH and HumanEva. Experiment shows our method achieves improved recognition accuracy compared with several other methods. We also demonstrate the superiority of features learnt by CNN compared with handcrafted features.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.