Abstract

The problem of modeling the dynamic structure of human activities is considered. Video is mapped to a semantic feature space, which encodes activity attribute probabilities over time. The binary dynamic system (BDS) model is proposed to jointly learn the distribution and dynamics of activities in this space. This is a non-linear dynamic system that combines binary observation variables and a hidden Gauss---Markov state process, extending both binary principal component analysis and the classical linear dynamic systems. A BDS learning algorithm, inspired by the popular dynamic texture, and a dissimilarity measure between BDSs, which generalizes the Binet---Cauchy kernel, are introduced. To enable the recognition of highly non-stationary activities, the BDS is embedded in a bag of words. An algorithm is introduced for learning a BDS codebook, enabling the use of the BDS as a visual word for attribute dynamics (WAD). Short-term video segments are then quantized with a WAD codebook, allowing the representation of video as a bag-of-words for attribute dynamics. Video sequences are finally encoded as vectors of locally aggregated descriptors, which summarize the first moments of video snippets on the BDS manifold. Experiments show that this representation achieves state-of-the-art performance on the tasks of complex activity recognition and event identification.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.