Recognizing 50 human action categories of web videos

Kishore K Reddy,Mubarak Shah

doi:10.1007/s00138-012-0450-4

Abstract

Action recognition on large categories of unconstrained videos taken from the web is a very challenging problem compared to datasets like KTH (6 actions), IXMAS (13 actions), and Weizmann (10 actions). Challenges like camera motion, different viewpoints, large interclass variations, cluttered background, occlusions, bad illumination conditions, and poor quality of web videos cause the majority of the state-of-the-art action recognition approaches to fail. Also, an increased number of categories and the inclusion of actions with high confusion add to the challenges. In this paper, we propose using the scene context information obtained from moving and stationary pixels in the key frames, in conjunction with motion features, to solve the action recognition problem on a large (50 actions) dataset with videos from the web. We perform a combination of early and late fusion on multiple features to handle the very large number of categories. We demonstrate that scene context is a very important feature to perform action recognition on very large datasets. The proposed method does not require any kind of video stabilization, person detection, or tracking and pruning of features. Our approach gives good performance on a large number of action categories; it has been tested on the UCF50 dataset with 50 action categories, which is an extension of the UCF YouTube Action (UCF11) dataset containing 11 action categories. We also tested our approach on the KTH and HMDB51 datasets for comparison.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Recognizing 50 human action categories of web videos

Abstract

Talk to us

Similar Papers

More From: Machine Vision and Applications

Lead the way for us

Journal: Machine Vision and Applications	Publication Date: Nov 16, 2012
Citations: 580

Similar Papers

Deep Learning Approach for Human Action Recognition Using a Time Saliency Map Based on Motion Features Considering Camera Movement and Shot in Video Image Sequences
Abdorreza Alavigharahbagh ... Vahid Hajihashemi
Information | VOL. 14
Abdorreza Alavigharahbagh, et. al.Abdorreza Alavigharahbagh ... Vahid Hajihashemi
15 Nov 2023
Information | VOL. 14

Various frameworks for integrating image and video streams for spatiotemporal information learning employing 2D–3D residual networks for human action recognition
Shaimaa Yosry ... Rania R Ziedan
Discover Applied Sciences | VOL. 6
Shaimaa Yosry, et. al.Shaimaa Yosry ... Rania R Ziedan
18 Mar 2024
Discover Applied Sciences | VOL. 6

A Context Knowledge Map Guided Coarse-to-fine Action Recognition.
Yanli Ji ... Heng Tao Shen
IEEE transactions on image processing : a publication of the IEEE Signal Processing Society | VOL. 29
Yanli Ji, et. al.Yanli Ji ... Heng Tao Shen
12 Nov 2019
IEEE transactions on image processing : a publication of the IEEE Signal Processing Society | VOL. 29

Action recognition with appearance–motion features and fast search trees
K Mikolajczyk ... H Uemura
Computer Vision and Image Understanding | VOL. 115
K Mikolajczyk, et. al.K Mikolajczyk ... H Uemura
12 Nov 2010
Computer Vision and Image Understanding | VOL. 115

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Recognizing 50 human action categories of web videos

Abstract

Talk to us

Similar Papers

More From: Machine Vision and Applications