A review of video action recognition based on 3D convolution

Xiankai Huang,Zhibin Cai

doi:10.1016/j.compeleceng.2023.108713

Abstract

Video action recognition is one of the topics for video understanding. Over the past decade, video action recognition has made great progress due to the emergence of deep learning, especially, the application of 3D convolution, which further improves the accuracy of recognition. However, three challenges remain: difficulty in capturing long video features, high computational costs, and difficulty in comparing methods due to different benchmarks. Therefore, in view of the above three challenges, this paper summarizes and analyzes existing video action recognition methods based on 3D convolution to help new researchers understand this field. Our contributions include 3 parts. Firstly, we introduce the classical video action recognition methods based on 3D convolution and point out two problems of the methods. Then, we summarize the existing improved methods based on 3D convolution and the popular datasets and compare and analyze the experimental results of these methods on the benchmark. Finally, we discuss current challenges for video action recognition and analyze future development trends.

Full Text