Abstract
Temporal action detection in untrimmed videos is an important yet challenging task. How to locate complex actions accurately is still an open question due to the ambiguous boundaries between action instances and the background. Recently a newly proposed work exploits Structured Segment Networks (SSN) for temporal action detection, which models temporal structure of action instances via structured temporal pyramids, and comprises two classifiers, respectively for classifying actions and determining proposal completeness. In this paper we attempt to delve the temporal boundary information when modeling temporal structure of action instance, by introducing to SSN the structured temporal boundary attention pyramid. On top of the pyramid, we add another set of classifiers for unit-wise completeness evaluation, which enables proposal recycling for efficient action detection. Experimental results on two challenging benchmarks, THUMOS’14 and ActivityNet, indicate that our Temporal Boundary Network shows a significant performance improvement compared with SSN, and achieves a competitive performance compared with state-of-the-arts.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have