Abstract

Many action recognition methods require significant computational resources to achieve good results on unedited videos. However, their performance on infrared videos, which contain less information, is often unsatisfactory. In this paper, we propose a multi-scale difference joint key frame extraction algorithm for action recognition in infrared surveillance videos. To evaluate our algorithm, we have created a self-built dataset comprising 1200 unedited infrared surveillance videos categorized into 10 action categories. Our algorithm extracts key frames by jointly analyzing the global and local differences between adjacent frames. Experimental analysis demonstrates that by using only 10 frames, our algorithm improves the accuracy of generic action recognition algorithms by more than 10% on both our self-built dataset and the Infrared-Visible dataset. Moreover, our proposed method achieves high recognition accuracy with minimal computational cost, even when using a small number of frames. It outperforms state-of-the-art methods by 1.82% on the Infrared-Visible dataset and 1.35% on the InfAR dataset. These results highlight the effectiveness of our algorithm as a preprocessing module to significantly enhance the accuracy of action recognition algorithms prior to employing generic models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call