Abstract

Long Short-Term Memory networks are making significant inroads into improving time series applications, including human action recognition. In a human action video, the spatial and temporal streams carry distinctive yet prominent information, hence many researchers turn to spatio-temporal models for human action recognition. A spatio-temporal model integrates the temporal network (e.g. Long Short-Term Memory) and spatial network (e.g. Convolutional Neural Networks). There are few challenges in the existing human action recognition: (1) the uni-directional modeling of Long Short-Term Memory making it unable to preserve the information from the future, (2) the sparse sampling strategy tends to lose prominent information when performing dimension reduction on the input of Long Short-Term Memory, and (3) the fusion strategy for consolidating the temporal network and spatial network. In view of this, we propose a Bidirectional Long Short-Term Memory with Temporal Dense Sampling and Fusion Network method to address the above-mentioned challenges. The Temporal Dense Sampling partitions the human action video into segments and then performs maxpooling operation along the temporal axis in each segment. A multi-stream bidirectional Long Short-Term Memory network is adopted to encode the long-term spatial and temporal dependencies in both forward and backward directions. Instead of assigning fixed weights to the spatial network and temporal network, we propose a fusion network where a fully-connected layer is trained to adaptively assign the weights for the networks. The empirical results demonstrate that the proposed Bidirectional Long Short-Term Memory with Temporal Dense Sampling and Fusion Network method outshines the state-of-the-art methods with an accuracy of 94.78% on UCF101 dataset and 70.72% on HMDB51 dataset.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.