Abstract

Video datasets suffer from huge inter-frame redundancy, which prevents deep networks from learning effectively and increases computational costs. Therefore, several methods adopt random/uniform frame sampling or key-frame selection techniques. Unfortunately, most of the learnable frame selection methods are customized for specific models and lack generality, independence, and scalability. In this paper, we propose a novel two-stage video-to-video summarization method termed FastPicker, which can efficiently select the most discriminative and representative frames for better action recognition. Independently, the discriminative frames are selected in the first stage based on the inter-frame motion computation, whereas the representative frames are selected in the second stage using a novel Transformer-based model. Learnable frame embeddings are proposed to estimate each frame contribution to the final video classification certainty. Consequently, the frames with the largest contributions are the most representative. The proposed method is carefully evaluated by summarizing several action recognition datasets and using them to train various deep models with several backbones. The experimental results demonstrate a remarkable performance boost on Kinetics400, Something-Something-v2, ActivityNet-1.3, UCF-101, and HMDB51 datasets, e.g., FastPicker downsizes Kinetics400 by 78.7% of its size while improving the human activity recognition.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.