This paper provides a novel approach for recognizing human behavior from RGB-D video data. The three-dimensional convolutional restricted Boltzmann machine (3DCRBM) is proposed which can extract features from the raw RGB-D data. In a physical model, the 3DCRBM differs from the restricted Boltzmann machine (RBM) as its weights are shared among all locations in the input and preserving spatial locality. Adjacent frames of the RGB image and the corresponding adjacent frames of the depth image are set as the input of 3DCRBM. Then, multiple 3D convolutional kernels can be applied to these four frames to extract spatio-temporal features. In the experiment of human behavior recognition, the deep belief network (DBN) is established by a layer of 3DCRBM network, convolutional neural network (CNN), and back propagation (BP) network. 3DCRBM is adapted for unsupervised training and getting a feature, while CNN and BP are used for supervised training and classifying the human behavior. The experiment results demonstrate that the correct differentiation rate of 95.7% is achieved, so the effectiveness of our approach could be validated.
Read full abstract