Abstract

The skeletal based human action recognition has its significant applications in the field of human computer interaction and human recognition from surveillance videos. However, the tasks suffers from the major challenges like view variance and noise in the data. These problems are limiting the performance of human action recognition. This paper focuses to solve these problems by adopting sequence based view invariant transform to effectively represent the spatio-temporal information of the skeletal data. The task of human action recognition in this paper is performed in three stages. Firstly, the raw 3D skeletal joint data obtained from the Microsoft Kinect sensor is transformed to eliminate the problem of view variations on a spatio-temporal data by implementing sequence based view invariant transform. In the second stage, the transformed joint locations of the skeletal data will be converted to RGB images by a color coding technique and forms a transformed joint location maps (TJLMs) . As a third stage, the discriminating features were extracted by the novel CNN architecture to performs the human action recognition task by means of class scores. Noticeable amount of recognition scores are achieved. Extensive experiments in four difficult 3D action datasets constantly show our method's superiority. The performance of the proposed method is compared with the other state-of-the-art methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.