Abstract
Recent years have witnessed a growing interest in compressed video action recognition due to the rapid growth of online videos. It remarkably reduces the storage by replacing raw videos with sparsely sampled RGB frames and other compressed motion cues (motion vectors and residuals). However, existing compressed video action recognition methods face two main issues: First, the inefficiency caused by the usage of coarse-level information under full resolution, and second, the disturbing due to the noisy dynamics in motion vectors. To address the two issues, this paper proposes a dynamic spatial focus method for efficient compressed video action recognition (CoViFocus). Specifically, we first use a light-weighted two-stream architecture to localize the task-relevant patches for both the RGB frames and motion vectors. Then the selected patch pair will be processed by a high-capacity two-stream deep model for the final prediction. Such a patch selection strategy crops out the irrelevant motion noise in motion vectors, as well as reduces the spatial redundancy of the inputs, leading to the high efficiency of our method in the compressed domain. Moreover, we found that the motion vectors can help our method to address the possibly happened <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">static-issue</i> , which means that the focus patches get stuck at some regions related to static objects rather than target actions, which further improves our method. Extensive results on both the HMDB-51 and UCF-101 datasets demonstrate the effectiveness and efficiency of our method in compressed video action recognition tasks.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Circuits and Systems for Video Technology
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.