Abstract

Recent years have witnessed significant progress in detection of basic human actions. However, most existing methods rely on assumptions such as known spatial locations and temporal segmentations or employ very computationally expensive approaches such as sliding window search through a spatio-temporal volume. It is difficult for such methods to scale up to handle the challenges in real applications such as video surveillance. In this paper, we present an efficient and practical approach to detecting basic human actions, such as making cell phone calls, putting down objects, and hand-pointing, which has been extensively tested on the challenging 2008 TRECVID surveillance event detection dataset . We propose a novel action representation scheme using a set of motion edge history images, which not only encodes both shape and motion patterns of actions without relying on precise alignment of human figures, but also facilitates learning of fast tree-structured boosting classifiers. Our approach is robust with respect to cluttered background as well as scale and viewpoint changes. It is also computationally efficient by taking advantage of human detection and tracking to reduce the searching space. We demonstrate promising results on the 50-hour TRECVID development set as well as two other widely-used benchmark datasets of action recognition, i.e. the KTH dataset and the Weizmann dataset.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.