Abstract

Skeleton-based recognition of human actions has received attention in recent years because of the popularity of 3D acquisition sensors. Existing studies use 3D skeleton data from video clips collected from several views. The body view shifts from the camera perspective when humans perform certain actions, resulting in unstable and noisy skeletal data. In this paper, we developed a view-adaptive (VA) mechanism that identifies the viewpoints across the sequence and transforms the skeleton view through a data-driven learning process to counteract influence of variations. Most existing methods use fixed human-defined prior criterion to reposition skeletons. We utilised an unsupervised reposition approach and jointly designed a VA neural network based on the graph neural network (GNN). Our VA-GNN model can transform the skeletons of distinct views into a considerably more consistent virtual perspective over preprocessing approach. The VA module learns the best-observed view because it determines the most suitable view and transforms the skeletons from the action sequence for end-to-end recognition along with suited graph topology with adaptive GNN. Thus, our strategy reduces the influence of view variance, allowing networks to focus on learning action-specific properties and resulting in improved performance. The accuracy achieved by the experiments on the four benchmark datasets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.