Abstract

In open-world scenarios, the analysis of action events from multiple viewpoints is crucial for achieving a holistic understanding, a concept that resonates with human perception. However, achieving a synthesis of information from multiple viewpoints presents a challenge, as the data can induce inter-view regularization, complicating the learning process. This paper is the first to delve into Online Action Detection (OAD) through a multi-view lens, underscoring the value of cross-observation in enriching view-level information. By harnessing the spatiotemporal dynamics inherent in multi-view video sequences, an Annealing Temporal–Spatial Contrastive Learning (ATSCL) consisting of Annealing Temporal Contrastive Learning (ATCL) and Spatial Contrastive Learning (SCL) is proposed, optimized for compatibility with RNN-based models. ATCL employs an annealing temporal loss to uncover the intrinsic video structures via a temporal annealing sampling mechanism. Concurrently, SCL utilizes a spatial loss to draw representations from various viewpoints closer together, mitigating the regularization effects. The ATSCL liberates training multi-view OAD from the stringent requirements of synchronized training videos, enabling the execution of OAD tasks asynchronously. Experiments demonstrate that the RNN-based models realize an average improvement of 5.92% on the DAHLIA dataset, 3.36% on the IKEA ASM dataset, 2.92% on the BREAKFAST dataset and 0.6% (mAP) on the THUMOS’14 dataset following the integration of the ATSCL framework, underscoring the ATSCL’s efficacy across different RNN structures.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.