Abstract

Skeleton-based action recognition is beneficial for understanding human behavior in videos, and thus has received much attention in recent years as an important research area in action recognition. Current research focuses on designing more advanced algorithms to better extract spatio-temporal information from skeleton data. However, due to the small amount of data in the existing skeleton dataset and the lack of effective data augmentation methods, it is easy to lead to overfitting in model training. To address this challenge, we propose a mix-based data augmentation method, Joint Mixing Data Augmentation (JMDA), which can generally improve the effectiveness and robustness of various skeleton-based action recognition algorithms. In terms of spatial information, we introduce SpatialMix (SM), a method that projects the original 3D skeleton discrete information into a 2D space. Then, SM mixes the projected spatial information between two random samples during the training process to achieve the spatial-based mixing data augmentation. Concerning temporal information, we propose TemporalMix (TM). Leveraging the temporal continuity in skeleton data, we perform a temporal resize operation on the original skeleton data, and then merge two random samples during training to achieve the temporal-based mixed data augmentation. Additionally, we analyze the Feature Mismatch (FM) problem caused by introducing mix-based data augmentation into skeleton data. Then we propose a new data preprocessing method called Feature Alignment (FA) to effectively address this problem and improve model performance. Moreover, we propose a novel training pipeline, Joint Training Strategy (JTS), which combines multiple mix-based data augmentation methods for further improvement of model performance. Specifically, our proposed JMDA is plug-and-play and widely applicable to skeleton-based action recognition models. At the same time, the application of JMDA does not increase the model parameters and there is almost no additional training cost. We conduct extensive experiments on NTU RGB+D 60 and NTU RGB+D 120 datasets to demonstrate the effectiveness and robustness of the proposed JMDA on several mainstream skeleton-based action recognition algorithms.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.