Comparative Analysis of AI-powered Approaches for Skeleton-based Child and Adult Action Recognition in Multi-person Environment

W.K.M Mithsara

doi:10.1109/csase51777.2022.9759717

Abstract

Machine learning on graphs is a unique framework structure to handle multiple objects. A lot of studies have gone into the topic of action recognition. Among them, most action recognition systems nowadays are based on skeletons. Now, GNN-based techniques are effective. In GNN-based techniques, the skeleton represents the graph, node as joints, and edges as the bones. As well as temporal relationship between the skeleton points could be obtained frame by frame. Most studies focus on detecting only one action for a single individual rather than several actions performed by multiple persons simultaneously in an untrimmed video in a well-segmented video. Many action recognition systems are focused on adult actions. However, no additional techniques for child action recognition. The action recognition of children is complex since there is no standard dataset for child action recognition. This study aims to determine a child's actions in a multi-person scenario. Initially, this method selects a minor domain action. Identify the child in the video using yolov5's custom object detection. Compare the ANN, 1DCNN, LSTM, and GNN (STGCN) with skeletal data to recognize the actions. The ANN method hasn't successfully determined the temporal relationship between frames. As a result, the GNN-based technique was employed in a multi-person situation to recognize behaviours. Adult skeleton data was taken from standards action datasets KTH and NTU-RGBD-120 using AlpaPose. Cropped YouTube videos are used for child action recognition. In terms of efficiency and accuracy, the GNN-based technique outperforms the others.

Full Text