Abstract

In order to make full use of the effective information in the video, this paper proposes a multi-model interactive video behavior recognition method. In order to solve the problems of incomplete human target detection and redundant feature extraction, YOLO_V4 is used to detect the human body and remove the redundant background information. Then, it is proposed to introduce the channel attention model SE-NET into the Inception_V3 network, so as to strengthen the extraction of key features and make the network pay more attention to the details of key features. Finally, the feature information is sent to LSTM network with memory function for action recognition and classification. The multi-model mutual fusion algorithm proposed in this paper is tested and verified on an internationally published UT-Interaction data set. The experimental results show that the accuracy of interactive behavior recognition is improved, and the improved accuracy is 85.1%, which indicates that the multi-model fusion method has higher accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call