Abstract

Temporal modeling still remains as a challenge for action recognition. Most existing temporal models focus on learning local variation between neighbor frames. There exists obvious deviations between local and global variations, such as subtle and notable motion variations. In this paper, we propose a global temporal difference module for action recognition, which consists of two sub-modules, <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">i.e.</i> , a global aggregation module and a global difference module. These two sub-modules cooperate following the idea of using prior knowledge from the global view ( <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">i.e.</i> , global motion variation) to guide local learning at each moment. In the global aggregation module, the global prior knowledge is learned by aggregating the visual feature sequence of video into a global vector. In the global difference module, we prepare the difference vector sequence of video by subtracting each local vector from the global vector. Our method performs as a contextual guidance with a global view. The sequential dependency between these difference vectors is exploited with a channel-wise self-attention operation. Finally, the difference vectors at each timestamp are further used to enhance the semantics of the original local features. The enhanced features endow the action recognition has less deviation to understand the variation in the video globally. We instantiate the global temporal difference module into the ResNet block to form a global temporal difference network (GTDNet). Exhaustive experiments are conducted and our method achieves competitive performance at small FLOPs on Something-Something V1 & V2 and Kinetics-400.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.