Abstract

To address the issues of large number of parameters and low recognition accuracy of action recognition networks, we propose an effective action recognition method based on lightweight network and rough-fine keyframe extraction. The method consists of three modules. The first module proposes a keyframe extraction network based on grayscale and feature descriptors, and employs a rough-fine idea to extract video keyframe. It reduces the redundancy of keyframe and enhances their ability to express action semantics. The second module introduces an attention-based feature extraction network, which combines decoupling ideas with attention mechanisms to enhance the accuracy of the action recognition network, while significantly reducing the network parameters. The third module is an improved attention module which optimizes the representation of local information. Finally, addition of the residual module fuses feature information between different convolutional layers. Experiments on two different datasets show that the number of parameters in the proposed method is only 6.4M. On publicly available datasets of HMDB51 and UCF101, the method achieves recognition accuracy of 75.69% and 93.18% without pre-training, respectively. The proposed method is valid and feasible on multiple public datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call