In this study, a new multi‐view human action recognition approach is proposed by exploiting low‐dimensional motion information of actions. Before feature extraction, pre‐processing steps are performed to remove noise from silhouettes, incurred due to imperfect, but realistic segmentation. Two‐dimensional motion templates based on motion history image (MHI) are computed for each view/action video. Histograms of oriented gradients (HOGs) are used as an efficient description of the MHIs which are classified using nearest neighbor (NN) classifier. As compared with existing approaches, the proposed method has three advantages: (i) does not require a fixed number of cameras setup during training and testing stages hence missing camera‐views can be tolerated, (ii) requires less memory and bandwidth requirements and hence (iii) is computationally efficient which makes it suitable for real‐time action recognition. As far as the authors know, this is the first report of results on the MuHAVi‐uncut dataset having a large number of action categories and a large set of camera‐views with noisy silhouettes which can be used by future workers as a baseline to improve on. Experimentation results on multi‐view with this dataset gives a high‐accuracy rate of 95.4% using leave‐one‐sequence‐out cross‐validation technique and compares well to similar state‐of‐the‐art approaches.
Read full abstract