Abstract

Effectively tackling the problem of temporal action localization (TAL) necessitates a visual representation that jointly pursues two confounding goals, i.e., fine-grained discrimination for temporal localization and sufficient visual invariance for action classification. We address this challenge by enriching the local, global and multi-scale contexts in the popular two-stage temporal localization framework. Our proposed model, dubbed ContextLoc++, can be divided into three sub-networks: L-Net, G-Net, and M-Net. L-Net enriches the local context via fine-grained modeling of snippet-level features, which is formulated as a query-and-retrieval process. Furthermore, the spatial and temporal snippet-level features, functioning as keys and values, are fused by temporal gating. G-Net enriches the global context via higher-level modeling of the video-level representation. In addition, we introduce a novel context adaptation module to adapt the global context to different proposals. M-Net further fuses the local and global contexts with multi-scale proposal features. Specially, proposal-level features from multi-scale video snippets can focus on different action characteristics. Short-term snippets with fewer frames pay attention to the action details while long-term snippets with more frames focus on the action variations. Experiments on the THUMOS14 and ActivityNet v1.3 datasets validate the efficacy of our method against existing state-of-the-art TAL algorithms.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.