Abstract

Inspired by Faster R-CNN, current state-of-the-art region-based action detection approaches like R-C3D and TAL-Net creatively proposed Temporal Region Proposal Network (TRPN) to generate proposals, which greatly improved action detection accuracy. However, since smooth L1 loss adopted in TRPN focuses on relative offset to pre-set anchor segments and is not sensitive enough to action boundaries and temporal regions, there is still room for improvement in temporal proposal generation. In this work, we elaborately design a Temporal Locality-Aware Network (TLAN) to learn a binary classifier using frame-level annotations. This allows our framework to effectively distinguish action instance (positive temporal regions) from background (negative temporal regions) by jointly optimizing temporal regions classification and temporal reference boxes regression, thus enabling precise localization. We further introduce a novel pooling method named Contextual Structured Spatial Temporal Pooling (CSSTP) to better exploit context and spatial-temporal information in an end-to-end fashion. Finally, TLAN and CSSTP are incorporated into a unified framework named AFNet. Extensive experiments have been conducted to evaluate the performance of our method. We achieve state-of-the-art performance on THUMOS’14 (20.6% higher than R-C3D, 6.7% higher than TAL-Net mAP @0.5) and competitive performance on Charades and ActivityNet. Besides, our inference speed reaches 1024 FPS, which is 250× faster than TAL-Net (3.5 FPS) and comparable to R-C3D (1030 FPS).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.