Abstract

ABSTRACT Temporal action localisation is a key research direction for video understanding in the field of computer vision. Current methods of using an attention mechanism only divides the video frame into an action instance frame and a background frame. As a result, the action context, which should belong to the background is misclassified into an action instance In addition, during the training phase of using point-supervised frame-level labels, action samples and background samples are unbalanced. The lack of background samples leads to the reduction of the activation score of the background so that the imbalance of samples will affect the separation of action examples from the background. All these reduce the accuracy of action classification and temporal localisation. Therefore, this paper proposesa multi-branch attention network and a pseudo-background label generation method. Experimental results show that the proposed method can improve the separation effect of action instances, background, and action context. Moreover, the proposed model achieves excellent performance on the THUMOS-14 dataset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call