Human activity recognition (HAR) has become a crucial research area for many applications, such as Healthcare, surveillance, etc. With the development of artificial intelligence (AI) and Internet of Things (IoT), sensor-based HAR has gained increasing attention and presents great advantages to existing work. Relying solely on existing labeled data may not adequately address the challenge of ensuring the model’s generalization ability to new data. The ’CLEAR’ method is designed to improve the accuracy of multimodal human activity recognition. This approach employs data augmentation, multimodal feature fusion, and contrastive learning techniques. These strategies are utilized to refine and extract highly discriminative features from various data sources, thereby significantly enhancing the model’s capacity to identify and classify diverse human activities accurately. CLEAR achieves high generalization performance on unknown datasets using only training data. Furthermore, CLEAR can be directly applied to various target domains without retraining or fine-tuning. Specifically, CLEAR consists of two parts. First, it employs data augmentation techniques in both the time and frequency domains to enrich the training data. Second, it optimizes feature extraction using attention-based multimodal fusion techniques and employs supervised contrastive learning to improve feature discriminability. We achieved accuracy rates of 81.09%, 90.45%, and 82.75% on three public datasets USC-HAD, DSADS, and PAMAP2, respectively. Additionally, when the training data are reduced from 100% to 20%, the model’s accuracy on the three datasets decreases by only about 5%, demonstrating that our model possesses strong generalization capabilities. Additionally, when the training data are reduced from 100% to 20%, the model’s accuracy on the three datasets decreases by only about 5%, demonstrating that our model possesses strong generalization capabilities.
Read full abstract