Table Tennis Action Recognition Method Based on Multimodal Data and Optimized ST-GCN
To address the problem of low recognition rate caused by the difficulty in capturing highspeed and subtle movements in table tennis, this work proposes a motion recognition method based on multimodal data and an optimized Spatial-Temporal Graph Convolutional Network (ST-GCN). The model introduces a Multi-Level Graph Convolutional Network (ML-GCN) architecture and constructs cross-level feature extraction channels, which effectively capture the spatiotemporal correlations between local subtle movements and global trajectories. The built-in hybrid attention mechanism realizes precise focusing on key skeletal nodes and core motion frames through adaptive weight assignment. Combined with the multimodal fusion strategy of visual signals and inertial sensor data, it significantly enhances the robustness of the model in scenarios with line-of-sight occlusion and motion blur. Test results based on a self-built multimodal table tennis dataset show that this method achieves an accuracy of 88.2%, a recall rate of 89.5% and an F1-score of 88.3%. This performance is significantly superior to the original ST-GCN and existing mainstream motion recognition algorithms, which confirms the core role of each optimization module in improving feature representation capability and computational efficiency. The study provides an efficient technical solution for the intelligent analysis of complex sports movements.
- Research Article
21
- 10.3390/app12010004
- Dec 21, 2021
- Applied Sciences
In this paper, we propose a new method for detecting abnormal human behavior based on skeleton features using self-attention augment graph convolution. The skeleton data have been proved to be robust to the complex background, illumination changes, and dynamic camera scenes and are naturally constructed as a graph in non-Euclidean space. Particularly, the establishment of spatial temporal graph convolutional networks (ST-GCN) can effectively learn the spatio-temporal relationships of Non-Euclidean Structure Data. However, it only operates on local neighborhood nodes and thereby lacks global information. We propose a novel spatial temporal self-attention augmented graph convolutional networks (SAA-Graph) by combining improved spatial graph convolution operator with a modified transformer self-attention operator to capture both local and global information of the joints. The spatial self-attention augmented module is used to understand the intra-frame relationships between human body parts. As far as we know, we are the first group to utilize self-attention for video anomaly detection tasks by enhancing spatial temporal graph convolution. Moreover, to validate the proposed model, we performed extensive experiments on two large-scale publicly standard datasets (i.e., ShanghaiTech Campus and CUHK Avenue datasets) which reveal the state-of-art performance for our proposed approach when compared to existing skeleton-based methods and graph convolution methods.
- Research Article
35
- 10.1109/access.2021.3052246
- Jan 1, 2021
- IEEE Access
The main core purpose of artificial emotional intelligence is to recognize human emotions. Technologies such as facial, semantic, or brainwave recognition applications have been widely proposed. However, the abovementioned recognition techniques for emotional features require a large number of training samples to obtain high accuracy. Human behaviour pattern can be trained and recognized by the continuous movement of the Spatial Temporal Graph Convolution Network (ST-GCN). However, this technology does not distinguish between the speed of delicate emotions, and the speed of human behaviour and delicate changes of emotions cannot be effectively distinguished. This research paper proposes Spatial Temporal Variation Convolutional Network training for human emotion recognition, using skeleton detection technology to calculate the degree of skeleton point change between consecutive actions and using the nearest neighbour algorithm to classify speed levels and train the ST-GCN recognition model to obtain the emotional state. Application of the speed change recognition ability of the Spatial Temporal Variation Graph Convolution Network (STV-GCN) to artificial emotional intelligence calculation makes it possible to efficiently recognize the delicate actions of happy, sad, fear, and angry in human behaviour. The STV-GCN technology proposed in this paper is compared with ST-GCN and can effectively improve the recognition accuracy by more than 50%.
- Research Article
13
- 10.3390/s20185260
- Sep 15, 2020
- Sensors
In the skeleton-based human action recognition domain, the spatial-temporal graph convolution networks (ST-GCNs) have made great progress recently. However, they use only one fixed temporal convolution kernel, which is not enough to extract the temporal cues comprehensively. Moreover, simply connecting the spatial graph convolution layer (GCL) and the temporal GCL in series is not the optimal solution. To this end, we propose a novel enhanced spatial and extended temporal graph convolutional network (EE-GCN) in this paper. Three convolution kernels with different sizes are chosen to extract the discriminative temporal features from shorter to longer terms. The corresponding GCLs are then concatenated by a powerful yet efficient one-shot aggregation (OSA) + effective squeeze-excitation (eSE) structure. The OSA module aggregates the features from each layer once to the output, and the eSE module explores the interdependency between the channels of the output. Besides, we propose a new connection paradigm to enhance the spatial features, which expand the serial connection to a combination of serial and parallel connections by adding a spatial GCL in parallel with the temporal GCLs. The proposed method is evaluated on three large scale datasets, and the experimental results show that the performance of our method exceeds previous state-of-the-art methods.
- Research Article
13
- 10.1016/j.applanim.2022.105594
- Mar 2, 2022
- Applied Animal Behaviour Science
Automatically recognizing four-legged animal behaviors to enhance welfare using spatial temporal graph convolutional networks
- Research Article
63
- 10.1109/tits.2023.3250424
- Aug 1, 2023
- IEEE Transactions on Intelligent Transportation Systems
Accurate spatial-temporal traffic modeling and prediction play an important role in intelligent transportation systems (ITS). Recently, various deep learning methods such as graph convolutional networks (GCNs) and recurrent neural networks (RNNs) have been widely adopted in traffic prediction tasks to extract spatial-temporal dependencies based on a large volume of high-quality training data. However, there exist data scarcity problems in some transportation networks, and in these cases, the performance of traditional GCNs and RNNs based approaches will degrade sharply. To address this problem, this paper proposes an adversarial domain adaptation with spatial-temporal graph convolutional network (Ada-STGCN) model to predict traffic indicators for a data-scarce target road network by transferring the knowledge from a data-sufficient source road network. Specifically, Ada-STGCN first develops a spatial-temporal graph convolutional network that combines the GCN and gated recurrent unit (GRU) to extract spatial-temporal dependencies from source and target road networks. Then, the technique of adversarial domain adaptation is integrated with the spatial-temporal graph convolutional network to learn discriminative and domain-invariant features to facilitate knowledge transfer. Experimental results on the real-world traffic datasets in the traffic flow prediction task demonstrate that our model yields the best prediction performance compared to state-of-the-art baseline methods.
- Supplementary Content
- 10.48550/arxiv.2103.15449
- Mar 29, 2021
- Lirias (KU Leuven)
Freezing of gait (FOG) is a common and debilitating gait impairment in Parkinson's disease. Further insight into this phenomenon is hampered by the difficulty to objectively assess FOG. To meet this clinical need, this paper proposes an automated motion-capture-based FOG assessment method driven by a novel deep neural network. Automated FOG assessment can be formulated as an action segmentation problem, where temporal models are tasked to recognize and temporally localize the FOG segments in untrimmed motion capture trials. This paper takes a closer look at the performance of state-of-the-art action segmentation models when tasked to automatically assess FOG. Furthermore, a novel deep neural network architecture is proposed that aims to better capture the spatial and temporal dependencies than the state-of-the-art baselines. The proposed network, termed multi-stage spatial-temporal graph convolutional network (MS-GCN), combines the spatial-temporal graph convolutional network (ST-GCN) and the multi-stage temporal convolutional network (MS-TCN). The ST-GCN captures the hierarchical spatial-temporal motion among the joints inherent to motion capture, while the multi-stage component reduces over-segmentation errors by refining the predictions over multiple stages. The experiments indicate that the proposed model outperforms four state-of-the-art baselines. Moreover, FOG outcomes derived from MS-GCN predictions had an excellent (r=0.93 [0.87, 0.97]) and moderately strong (r=0.75 [0.55, 0.87]) linear relationship with FOG outcomes derived from manual annotations. The proposed MS-GCN may provide an automated and objective alternative to labor-intensive clinician-based FOG assessment. Future work is now possible that aims to assess the generalization of MS-GCN to a larger and more varied verification cohort.
- Research Article
178
- 10.1109/tits.2021.3136287
- Sep 1, 2022
- IEEE Transactions on Intelligent Transportation Systems
While considering the spatial and temporal features of traffic, capturing the impacts of various external factors on travel is an essential step towards achieving accurate traffic forecasting. However, existing studies seldom consider external factors or neglect the effect of the complex correlations among external factors on traffic. Intuitively, knowledge graphs can naturally describe these correlations. Since knowledge graphs and traffic networks are essentially heterogeneous networks, it is challenging to integrate the information in both networks. On this background, this study presents a knowledge representation-driven traffic forecasting method based on spatial-temporal graph convolutional networks. We first construct a knowledge graph for traffic forecasting and derive knowledge representations by a knowledge representation learning method named KR-EAR. Then, we propose the Knowledge Fusion Cell (KF-Cell) to combine the knowledge and traffic features as the input of a spatial-temporal graph convolutional backbone network. Experimental results on the real-world dataset show that our strategy enhances the forecasting performances of backbones at various prediction horizons. The ablation and perturbation analysis further verify the effectiveness and robustness of the proposed method. To the best of our knowledge, this is the first study that constructs and utilizes a knowledge graph to facilitate traffic forecasting; it also offers a promising direction to integrate external information and spatial-temporal information for traffic forecasting. The source code is available at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/lehaifeng/T-GCN/tree/master/KST-GCN</uri> .
- Research Article
44
- 10.1186/s12984-022-01025-3
- May 21, 2022
- Journal of NeuroEngineering and Rehabilitation
BackgroundFreezing of gait (FOG) is a common and debilitating gait impairment in Parkinson’s disease. Further insight into this phenomenon is hampered by the difficulty to objectively assess FOG. To meet this clinical need, this paper proposes an automated motion-capture-based FOG assessment method driven by a novel deep neural network.MethodsAutomated FOG assessment can be formulated as an action segmentation problem, where temporal models are tasked to recognize and temporally localize the FOG segments in untrimmed motion capture trials. This paper takes a closer look at the performance of state-of-the-art action segmentation models when tasked to automatically assess FOG. Furthermore, a novel deep neural network architecture is proposed that aims to better capture the spatial and temporal dependencies than the state-of-the-art baselines. The proposed network, termed multi-stage spatial-temporal graph convolutional network (MS-GCN), combines the spatial-temporal graph convolutional network (ST-GCN) and the multi-stage temporal convolutional network (MS-TCN). The ST-GCN captures the hierarchical spatial-temporal motion among the joints inherent to motion capture, while the multi-stage component reduces over-segmentation errors by refining the predictions over multiple stages. The proposed model was validated on a dataset of fourteen freezers, fourteen non-freezers, and fourteen healthy control subjects.ResultsThe experiments indicate that the proposed model outperforms four state-of-the-art baselines. Moreover, FOG outcomes derived from MS-GCN predictions had an excellent (r = 0.93 [0.87, 0.97]) and moderately strong (r = 0.75 [0.55, 0.87]) linear relationship with FOG outcomes derived from manual annotations.ConclusionsThe proposed MS-GCN may provide an automated and objective alternative to labor-intensive clinician-based FOG assessment. Future work is now possible that aims to assess the generalization of MS-GCN to a larger and more varied verification cohort.
- Research Article
2
- 10.3390/electronics11213498
- Oct 28, 2022
- Electronics
In recent years, spatial-temporal graph convolutional networks have played an increasingly important role in skeleton-based human action recognition. However, there are still three major limitations to most ST-GCN-based approaches: (1) They only use a single joint scale to extract action features, or process joint and skeletal information separately. As a result, action features cannot be extracted dynamically through the mutual directivity between the scales. (2) These models treat the contributions of all joints equally in training, which neglects the problem that some joints with difficult loss-reduction are critical joints in network training. (3) These networks rely heavily on a large amount of labeled data, which remains costly. To address these problems, we propose a Tohjm-trained multiscale spatial-temporal graph convolutional neural network for semi-supervised action recognition, which contains three parts: encoder, decoder and classifier. The encoder’s core is a correlated joint–bone–body-part fusion spatial-temporal graph convolutional network that allows the network to learn more stable action features between coarse and fine scales. The decoder uses a self-supervised training method with a motion prediction head, which enables the network to extract action features using unlabeled data so that the network can achieve semi-supervised learning. In addition, the network is also capable of fully supervised learning with the encoder, decoder and classifier. Our proposed time-level online hard joint mining strategy is also used in the decoder training process, which allows the network to focus on hard training joints and improve the overall network performance. Experimental results on the NTU-RGB + D dataset and the Kinetics-skeleton dataset show that the improved model achieves good performance for action recognition based on semi-supervised training, and is also applicable to the fully supervised approach.
- Research Article
9
- 10.20965/jaciii.2024.p0552
- May 20, 2024
- Journal of Advanced Computational Intelligence and Intelligent Informatics
Basketball has rapidly developed in recent years. Analysis of various moves in basketball can provide technical references for professional players and assist referees in judging games. Traditional technology can no longer provide modern basketball players with theoretical support. Therefore, using intelligent methods to recognize human body postures in basketball was a relatively innovative approach. To be able to recognize the basketball sports posture of players more accurately, the experiment proposes a basketball stance recognition model based on enhanced graph convolutional networks (GCN), that is, the basketball stance recognition model based on enhanced GCN and spatial temporal graph convolutional network (ST-GCN) model. This model combines the respective advantages of the GCN and temporal convolutional network and can handle graph-structured data with time-series relationships. The ST-GCN can be further deduced by realizing the convolution operation of the graph structure and establishing a spatiotemporal graph convolution model for the posture sequence of a person’s body. A dataset of technical basketball actions is constructed to verify the effectiveness of the ST-GCN model. The final experimental findings indicated that the final recognition accuracy of the ST-GCN model for basketball postures was approximately 95.58%, whereas the final recognition accuracy of the long short term memory + multiview re-observation skeleton action recognition (LSTM+MV+AC) model was about 93.65%.
- Research Article
121
- 10.1049/cit2.12012
- Mar 17, 2021
- CAAI Transactions on Intelligence Technology
A spatial attentive and temporal dilated (SATD) GCN for skeleton‐based action recognition
- Conference Article
66
- 10.1109/iccvw.2019.00216
- Oct 1, 2019
Recent research has shown that modeling the dynamic joint features of the human body by a graph convolutional network (GCN) is a groundbreaking approach for skeleton-based action recognition, especially for the recognition of the body-motion, human-object and human-human interactions. Nevertheless, how to model and utilize coherent skeleton information comprehensively is still an open problem. In order to capture the rich spatiotemporal information and utilize features more effectively, we introduce a spatial residual layer and a dense connection block enhanced spatial temporal graph convolutional network. More specifically, our work introduces three aspects. Firstly, we extend spatial graph convolution to spatial temporal graph convolution of cross-domain residual to extract more precise and informative spatiotemporal feature, and reduce the training complexity by feature fusion in the, so-called, spatial residual layer. Secondly, instead of simply superimposing multiple similar layers, we use dense connection to take full advantage of the global information. Thirdly, we combine the above mentioned two components to create a spatial temporal graph convolutional network (ST-GCN), referred to as SDGCN. The proposed graph representation has a new structure. We perform extensive experiments on two large datasets: Kinetics and NTU-RGB+D. Our method achieves a great improvement in performance compared to the mainstream methods. We evaluate our method quantitatively and qualitatively, thus proving its effectiveness.
- Research Article
23
- 10.3390/s23146318
- Jul 11, 2023
- Sensors
The construction industry is accident-prone, and unsafe behaviors of construction workers have been identified as a leading cause of accidents. One important countermeasure to prevent accidents is monitoring and managing those unsafe behaviors. The most popular way of detecting and identifying workers' unsafe behaviors is the computer vision-based intelligent monitoring system. However, most of the existing research or products focused only on the workers' behaviors (i.e., motions) recognition, limited studies considered the interaction between man-machine, man-material or man-environments. Those interactions are very important for judging whether the workers' behaviors are safe or not, from the standpoint of safety management. This study aims to develop a new method of identifying construction workers' unsafe behaviors, i.e., unsafe interaction between man-machine/material, based on ST-GCN (Spatial Temporal Graph Convolutional Networks) and YOLO (You Only Look Once), which could provide more direct and valuable information for safety management. In this study, two trained YOLO-based models were, respectively, used to detect safety signs in the workplace, and objects that interacted with workers. Then, an ST-GCN model was trained to detect and identify workers' behaviors. Lastly, a decision algorithm was developed considering interactions between man-machine/material, based on YOLO and ST-GCN results. Results show good performance of the developed method, compared to only using ST-GCN, the accuracy was significantly improved from 51.79% to 85.71%, 61.61% to 99.11%, and 58.04% to 100.00%, respectively, in the identification of the following three kinds of behaviors, throwing (throwing hammer, throwing bottle), operating (turning on switch, putting bottle), and crossing (crossing railing and crossing obstacle). The findings of the study have some practical implications for safety management, especially workers' behavior monitoring and management.
- Research Article
35
- 10.1016/j.compstruct.2023.117496
- Aug 22, 2023
- Composite Structures
Spatial-temporal graph convolutional networks (STGCN) based method for localizing acoustic emission sources in composite panels
- Research Article
13
- 10.1088/1742-6596/1621/1/012047
- Aug 1, 2020
- Journal of Physics: Conference Series
In autonomous driving scenarios, pedestrian trajectory prediction is an important research direction. Based on the spatio-temporal graph convolutional neural network, we propose a new pedestrian trajectory prediction algorithm. The new algorithm constructs a series of new models around pedestrian intention estimation. The construction of the estimation algorithm considers the following aspects: the contextual information of pedestrians and the surrounding environment, the “pedestrian ego-vehicle” interaction combined with the vehicle speed estimation, the pedestrian’s own skeletal structure information and body language estimation, which includes head joints and the relative structural relationship of the torso joints, including whether it is out of the same plane, is rotated, and so on. Skeleton information feature extraction and construction adopts the method of graph convolutional neural network to structure pedestrians into joints in the form of graphs in non-Euclidean space, and further adopts spatial temporal graph convolutional network for feature extraction and learning. The new method is named a “head-torso”-based spatial temporal graph convolutional network (HT-STGCN). On the dataset PID, the novel method achieves substantial improvements over mainstream methods. Experimental results show that combining HT-STGCN with observed action can improve trajectory prediction.