Abstract

The extraction of temporal features in video is an essential task for effective action recognition. Previous networks utilizes optical flow as effective temporal features, which utilizes positional relationship between pixels to extract temporal features. However, not all the pixels are meaningful in a video frame, with most pixels related to background information rather than the action itself. In this paper, we propose a novel temporal feature extraction model, Key points inter-Frame Transfer Module (KFTM), which extracts the temporal feature by extracting the transfer of multiple key points along the temporal axis. Such information can be equivalent to the temporal feature of the video since it also represents the positional relationship between pixels. Yet such method is more efficient due to the use of key points. Meanwhile, to extract the temporal feature more effectively, we add the attention mechanism which pays more attention to the transfer of key points most relevant to the action. Our proposed module obtains competitive performance on both UCF101 and HMDB51 datasets with 96.49% accuracy on UCF101 and 77.48% accuracy on HMDB51 datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call