Temporal Pyramid Research Articles

Recently, substantial research effort has focused on how to apply CNNs or RNNs to better capture temporal patterns in videos, so as to improve the accuracy of video classification. In this paper, we investigate the potential of a purely attention based local feature integration. Accounting for the characteristics of such features in video classification, we first propose Basic Attention Clusters (BAC), which concatenates the output of multiple attention units applied in parallel, and introduce a shifting operation to capture more diverse signals. Experiments show that BAC can achieve excellent results on multiple datasets. However, BAC treats all feature channels as an indivisible whole, which is suboptimal for achieving a finer-grained local feature integration over the channel dimension. Additionally, it treats the entire local feature sequence as an unordered set, thus ignoring the sequential relationships. To improve over BAC, we further propose the channel pyramid attention schema by splitting features into sub-features at multiple scales for coarse-to-fine sub-feature interaction modeling, and propose the temporal pyramid attention schema by dividing the feature sequences into ordered sub-sequences of multiple lengths to account for the sequential order. Our final model pyramid×pyramid attention clusters (PPAC) combines both channel pyramid attention and temporal pyramid attention to focus on the most important sub-features, while also preserving the temporal information of the video. We demonstrate the effectiveness of PPAC on seven real-world video classification datasets. Our model achieves competitive results across all of these, showing that our proposed framework can consistently outperform the existing local feature integration methods across a range of different scenarios.

Convolutional Neural Networks (CNNs) usually use top-level appearance features of video frames for action recognition. However, these methods discard the implicit complementary advantages across different-scale appearance representations which are effective for object detection, instance segmentation and person re-identification. In this paper, a new spatial pyramid module is proposed to take full use of inherent multi-scale information of CNNs with nearly cost-free by which a bottom-up architecture with lateral connections is constructed for combining high-, mid-, low-level representations of CNNs into a hierarchical frame-level feature elaborately. Additionally, temporal relations at appropriate timescale are contributed to the identification of an action. To this end, we also propose a new temporal pyramid module in which frame-level features belonged to one video are reused by various timescale pooling approaches to get different time-grained features of snippets efficiently. Followed by snippet-relation reasoning, different timescale temporal relations are derived and accumulated for the comprehensive prediction. Unifying the proposed spatial and temporal pyramid modules, a novel network, Spatial-Temporal Pyramid Network (S-TPNet), is proposed to extract spatial-temporal pyramid features for action recognition in videos. Unlike previous models which boost performance at the cost of computation, S-TPNet can be trained in an end-to-end fashion with great efficiency. Extensive experiments on Kinetics, UCF101, and HMDB51 demonstrate that S-TPNet displays significant performance improvements compared with existing frameworks and obtains competitive performance with the state-of-the-arts.11The code is available at https://www.github.com/ZhenxingZheng/S-TPNet.

Temporal Pyramid Research Articles

Related Topics

Articles published on Temporal Pyramid

Temporal Pyramid Network for Pedestrian Trajectory Prediction with Multi-Supervision

Temporal Pyramid Network With Spatial-Temporal Attention for Pedestrian Trajectory Prediction

Temporal pyramid attention‐based spatiotemporal fusion model for Parkinson's disease diagnosis from gait data

Temporal Pyramid Pooling for Decoding Motor-Imagery EEG Signals

Temporal Pyramid Recurrent Neural Network

Current principles of management of massive cholesteatoma and temporal bone pyramid defect

A Deep Learning Framework for Assessing Physical Rehabilitation Exercises.

Purely Attention Based Local Feature Integration for Video Classification.

ConvNets-based action recognition from skeleton motion maps

Temporal Action Detection with Structured Segment Networks

Learning Human Pose Models from Synthesized Data for Robust RGB-D Action Recognition

Attentive Temporal Pyramid Network for Dynamic Scene Classification

A weighting scheme for mining key skeletal joints for human action recognition

Spatial-temporal pyramid based Convolutional Neural Network for action recognition

Heterogeneous hand gesture recognition using 3D dynamic skeletal data

Multiscale Fine-Grained Heart Rate Variability Analysis for Recognizing the Severity of Hypertension.

Human Activity Recognition with Posture Tendency Descriptors on Action Snippets

Action Recognition Using Multiple Pooling Strategies of CNN Features

Speech Emotion Recognition Using Deep Convolutional Neural Network and Discriminant Temporal Pyramid Matching

Efficient human motion capture data annotation via multi‐view spatiotemporal feature fusion

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Temporal Pyramid Research Articles

Related Topics

Articles published on Temporal Pyramid

Temporal Pyramid Network for Pedestrian Trajectory Prediction with Multi-Supervision

Temporal Pyramid Network With Spatial-Temporal Attention for Pedestrian Trajectory Prediction

Temporal pyramid attention‐based spatiotemporal fusion model for Parkinson's disease diagnosis from gait data

Temporal Pyramid Pooling for Decoding Motor-Imagery EEG Signals

Temporal Pyramid Recurrent Neural Network

Current principles of management of massive cholesteatoma and temporal bone pyramid defect

A Deep Learning Framework for Assessing Physical Rehabilitation Exercises.

Purely Attention Based Local Feature Integration for Video Classification.

ConvNets-based action recognition from skeleton motion maps

Temporal Action Detection with Structured Segment Networks

Learning Human Pose Models from Synthesized Data for Robust RGB-D Action Recognition

Attentive Temporal Pyramid Network for Dynamic Scene Classification

A weighting scheme for mining key skeletal joints for human action recognition

Spatial-temporal pyramid based Convolutional Neural Network for action recognition

Heterogeneous hand gesture recognition using 3D dynamic skeletal data

Multiscale Fine-Grained Heart Rate Variability Analysis for Recognizing the Severity of Hypertension.

Human Activity Recognition with Posture Tendency Descriptors on Action Snippets

Action Recognition Using Multiple Pooling Strategies of CNN Features

Speech Emotion Recognition Using Deep Convolutional Neural Network and Discriminant Temporal Pyramid Matching

Efficient human motion capture data annotation via multi‐view spatiotemporal feature fusion