Abstract

Temporal action proposal generation aims at localizing the temporal segments containing human actions in a video. This work proposes a centerness-aware network (CAN), which is a novel one-stage approach intended to generate action proposals as keypoint triplets. A keypoint triplet contains two boundary points (starting and ending) and one center point. Specifically, we evaluate the probabilities of each temporal location in the video whether it is at the boundaries or the center region of ground truth action proposals. CAN optimizes the predicted boundary points interactively in a bidirectional adaptation form by exploiting the dependencies among them. Furthermore, to accurately locate the center points of action proposals with different time spans, temporal feature pyramids are utilized to incorporate multi-scale information explicitly. Using the generated three keypoints, CAN efficiently retrieves temporal proposals by grouping keypoints into triplets if they are geometrically aligned. Experiments show that CAN achieves the state-of-the-art performance on the public THUMOS-14 and ActivityNet-1.3 datasets. Moreover, further experiments demonstrate that by applying action classifiers on proposals generated by CAN, our method achieves the state-of-the-art performance in temporal action localization.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.