Abstract

In this paper, we propose a human action recognition method using HOIRM (histogram of oriented interest region motion) feature fusion and a BOW (bag of words) model based on AP (affinity propagation) clustering. First, a HOIRM feature extraction method based on spatiotemporal interest points ROI is proposed. HOIRM can be regarded as a middle-level feature between local and global features. Then, HOIRM is fused with 3D HOG and 3D HOF local features using a cumulative histogram. The method further improves the robustness of local features to camera view angle and distance variations in complex scenes, which in turn improves the correct rate of action recognition. Finally, a BOW model based on AP clustering is proposed and applied to action classification. It obtains the appropriate visual dictionary capacity and achieves better clustering effect for the joint description of a variety of features. The experimental results demonstrate that by using the fused features with the proposed BOW model, the average recognition rate is 95.75% in the KTH database, and 88.25% in the UCF database, which are both higher than those by using only 3D HOG+3D HOF or HOIRM features. Moreover, the average recognition rate achieved by the proposed method in the two databases is higher than that obtained by other methods.

Highlights

  • The aim of video-based human action recognition is to recognize human action patterns in video; it has wide application in intelligent monitoring, human–computer interaction, and video searching [1]

  • The KTH human action database is provided by Schuldt [30] and is currently considered as a standard database for testing action recognition algorithms

  • We experimented with all the videos in the KTH database and constructed a visual dictionary using a BOW model based on K-Means clustering

Read more

Summary

Introduction

The aim of video-based human action recognition is to recognize human action patterns in video; it has wide application in intelligent monitoring, human–computer interaction, and video searching [1]. Human action recognition mainly includes two steps: feature extraction and description, and action classification and recognition. Regarding the extraction and description of action features, video-based human action recognition methods are based either on global features or on local spatiotemporal interest points. The former have become the mainstream owing to their robustness to various types of interference. These methods extract the underlying features for action description by detecting interest points using significant pixel value changes in the spatiotemporal neighborhood, without requiring object detection, image segmentation, and target tracking.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call