Abstract

In this paper, we propose an approach to recognize human actions by the fusion of RGB and Depth data. Firstly, Motion History Images (MHI) are generated from RGB videos which represent the temporal information about the action. Then the original depth data is rotated in 3D point clouds and three Depth Motion Maps (DMM) are generated over the entire depth sequence corresponding to the front, side and top projection views. A 4 Channel Deep Convolutional Neural Network is trained, where the first channel is for classifying MHIs and the remaining three for the front, side and top view generated from depth data respectively. The proposed method is evaluated on publically available UTD-MHAD dataset which contains both RGB and depth videos. Experimental results show that combining two modalities gives better recognition accuracy than using each modality individually.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call