Abstract
Action recognition is one important but challenging tasks in computer vision. The 3D convolutional neural network is one mainstream method for action recognition because it can extract temporal and spatial information in the video simultaneously. However, 3D convolutional neural network has a serious drawback which is that its parameter quantity is too large. Depthwise convolution is a form of group convolution, which can effectively reduce the parameter of convolution kernel, and has been widely applied in 2D convolutional neural network. Therefore, we propose to introduce depthwise convolution into the 3D convolutional neural network. We choose 3D resnet as our basic model, and construct our model by replacing the 3D convolution kernel in the baseline with the depthwise convolution, we named our proposed model as depthwise separable network (DSN). We conducted experiments on UCF101 and HMDB51 dataset. The experimental results show that by introducing the depthwise convolution, our DSN network can not only reduce the parameters of the baseline, but also can moderately improve the accuracy.
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have