Abstract

Temporal information is important for human action video recognition. With the widely used spatio-temporal neural networks, researchers have found that the learned high-level features preserve overfitted spatial information and limited temporal information, leading to inferior performance. This is because existing networks lack efficient regularization for the temporal structure. To learn more robust temporal features, we propose a temporal regularization method named Temporal Segment Dropout (TSD). TSD drops the most salient spatial features in order to enhance the temporal features in a clip of temporal segments. Without learning from complex examples, TSD can be easily deployed in existing networks. In the experiment, TSD is extensively evaluated on benchmark action recognition datasets, which brings consistent improvements over the baselines, especially for the action-centric classes.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.