Abstract

Temporal action detection not only requires correct classification but also needs to detect the start and end times of each action accurately. However, traditional approaches always employ sliding windows or actionness to predict the actions, and it is different to train to model with sliding windows or actionness by end-to-end means. In this article, we attempt a different idea to detect the actions end-to-end, which can calculate the probabilities of actions directly through one network as one part of the results. We present PCAD, a novel proposal complementary action detector to deal with video streams under continuous, untrimmed conditions. Our approach first uses a simple fully 3D convolutional network to encode the video streams and then generates candidate temporal proposals for activities by using anchor segments. To generate more precise proposals, we also design a boundary proposal network to offer some complementary information for the candidate proposals. Finally, we learn an efficient classifier to classify the generated proposals into different activities and refine their temporal boundaries at the same time. Our model can achieve end-to-end training by jointly optimizing classification loss and regression loss. When evaluating on the THUMOS’14 detection benchmark, PCAD achieves state-of-the-art performance in high-speed models.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.