Abstract

Usage of depth data in various computer vision applications gains popularity with cheap depth sensors available on the market. In this study, a method that utilizes Deep Learning is proposed for single human actions and dyadic actions. Depth data of the actions are used to construct a three-dimensional (3D) template. These templates are rotated in different directions and two-dimensional (2D) views from different angles are stored. Acquired 2D images are used in deep feature extraction. AlexNet (Krizhevsky, Sutskever, & Hinton, 2012) pre-trained convolutional neural network is used for deep feature extraction. For all viewpoints, deep features are extracted and concatenated. After that random forest classifier is used for action recognition. The contributions of this paper are as follows. First, a 3D isosurface models is constructed to figure actions sequence. Second, we project 3D shapes into a 2D feature space by taking its snapshots from different views and giving them to a pre-trained CNN as input for feature extraction. We explore more information about the actions by extracting features from different layers of the deep neural network. We compare the results of the deep features extracted from different layers of the pre-trained CNN. Various classifiers are trained with extracted deep features. Because of the complex structure of the 3D shapes, there are a limited number of feature extraction methods. Our method obtains close to state-of-the-art performance on various single and two-person action datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call