Abstract
Achieving joint segmentation and recognition of continuous actions in a long-term video is a challenging task due to the varying durations of actions and the complex transitions of multiple actions. In this paper, a novel discriminative structural model is proposed for splitting a long-term video into segments and annotating the action label of each segment. A set of state variables is introduced into the model to explore discriminative semantic concepts shared among different actions. To exploit the statistical dependences among segments, temporal context is captured at both the action level and the semantic concept level. The state variables are treated as latent information in the discriminative structural model and inferred during both training and testing. Experiments on multi-view IXMAS and realistic Hollywood datasets demonstrate the effectiveness of the proposed method.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.