Abstract
In this paper, a novel method based on the two-stream dictionary learning architecture for human action recognition is proposed. The architecture consists of interest patch (IP) detector and descriptor, two-stream dictionary models, and support vector machine (SVM) for classification. The novel IP detector combines a human detector and a contour detector to extract patches of interest on human contours. Then the IP descriptors are calculated in spatial stream and temporal stream separately. In each stream, a dictionary is trained for each action with the IP descriptors as an action model. In this way, measuring the similarity between an action sequence and an action model is transformed to reconstructing the IPs in this sequence with the model and computing the reconstruction error. For each action, an IP distribution histogram is constructed and the histogram is further used to train an SVM classifier in each stream. A score fusion method is applied to fuse the spatial and temporal SVM classification results to make a final decision. The proposed architecture is examined on four public data sets with different background complexities and camera motion conditions: Weizmann data set, KTH data set, Olympic sports data set, and HMDB51 data set. The results are further compared with state-of-the-art approaches in the experiment section to confirm the effectiveness of this architecture.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Circuits and Systems for Video Technology
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.