Abstract
Most of recent methods for action/activity recognition, usually based on static classifiers, have achieved improvements by integrating context of local interest point (IP) features such as spatiotemporal IPs by characterising their neighbourhood under different scales. In this study, the authors propose a new approach that explicitly models the sequential aspect of activities. First, a sliding window segmentation technique splits the video stream into overlapping short segments. Each window is characterised by a local bag of words of IPs encoded by motion information. A first-layer support vector machine provides for each window a vector of conditional class probabilities that summarises all discriminant information that is relevant for sequence recognition. The sequence of these stochastic vectors is then fed to a hidden conditional random field for inference at the sequence level. They also show how their approach can be naturally extended to the problem of conjoint segmentation and recognition of a sequence of action classes within a continuous video stream. They have tested their model on various human action and activity datasets and the obtained results compare favourably with current state of the art.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.