WECAR: An End-Edge Collaborative Inference and Training Framework for WiFi-Based Continuous Human Activity Recognition
WiFi-based human activity recognition (HAR) holds significant promise for ubiquitous sensing in smart environments. A critical challenge is enabling systems to dynamically adapt to evolving scenarios, learn new activities without catastrophically forgetting prior knowledge, and meet edge devices’ computational constraints. Current approaches struggle to reconcile these due to high historical data storage demands and inefficient parameter utilization. We propose WECAR, an end-edge collaborative inference and training framework for WiFi-based continuous HAR. In this framework, edge devices handle model training, lightweight optimization, and updates, while end devices perform efficient inference. WECAR introduces two key innovations, i.e., dynamic continual learning with parameter efficiency and hierarchical distillation for end deployment. For the former, we propose a transformer-based architecture enhanced by task-specific dynamic model expansion and stability-aware selective retraining. For the latter, we propose a dual-phase distillation mechanism that includes multi-head self-attention relation distillation and prefix relation distillation. We implement WECAR based on heterogeneous hardware using Jetson Nano as edge devices and the ESP32 as end devices, respectively. Our experiments across three public WiFi datasets reveal that WECAR not only outperforms several state-of-the-art methods in performance and parameter efficiency, but also achieves a substantial reduction in the model’s parameter count post-optimization without sacrificing accuracy.
- Conference Article
16
- 10.1109/icccworkshops52231.2021.9538931
- Jul 28, 2021
Nowadays, WiFi-based human activity recognition (HAR), as a key enabler of building smart home, has gained tremendous attention because of its superior properties such as privacy protection and low-cost deployment. Since each human motion within the signal coverage would cause different wireless channel disturbances, it is possible to identify and interpret these activity-induced signal changes for human behavior recognition. Although many approaches attempt to extract distinct patterns from WiFi measurements corresponding to user activities, the signals can be easily attenuated due to environmental variations in the real settings, so that their recognition accuracy may be severely deteriorated. In order to extract the key features in a more distinguished way, in this paper, we propose WiWave, a WiFi-based device-free HAR system leveraging wavelet integrated convolutional neural network (CNN). Instead of utilizing pooling operations, our proposed network has introduced discrete wavelet transform (DWT) into the convolutional architectures, which can combine the good time-frequency local characteristics of the wavelet transform with the self-learning ability of the neural network. Consequently, not only high-level features from low-frequency components can be obtained automatically, but also the the size of feature map can be reduced. The experiment results demonstrate that WiWave achieves average 94.87% accuracy for distinguishing ten actions in real-world home environment.
- Research Article
- 10.1109/jiot.2026.3676817
- Jan 1, 2026
- IEEE Internet of Things Journal
WiFi-based human activity recognition (HAR) has emerged as a focal point within the Internet of Things landscape, owing to its non-intrusive sensing capabilities and inherent privacy-preserving advantages. In existing WiFi-based HAR research, channel state information (CSI) is primarily utilized to capture activity-related features and enable recognition. However, CSI-based cross-domain HAR remains challenged by issues such as redundant subcarriers, limited samples in the target domain, and high sensitivity of CSI to environmental variations. To address these challenges, this paper proposes Wi-DMAR, a WiFi-based cross-domain HAR framework that integrates three key modules. First, an adaptive subcarrier selection module computes the correlation between each subcarrier and the principal components, identifies subcarriers with high contribution, preserves essential activity-related features, reduces data dimensionality, and lowers computational overhead. Second, a conditional diffusion–based data augmentation module employs a Transformer-based feature extractor to capture domain-specific representations of target-domain data, and optimizes domain consistency loss and domain-guided diffusion loss to generate pseudo samples that resemble the target-domain distribution, thereby mitigating sample scarcity. Third, an activity recognition module based on sample similarity learning reformulates the traditional label classification problem into a sample comparison task, by quantifying similarity between samples, it performs activity recognition and enhances cross-domain generalization. Experimental results demonstrate that Wi-DMAR achieves superior recognition accuracy compared with state-of-the-art cross-domain HAR methods such as DiffAR and MetaAct. Ablation studies further confirm that each core component contributes positively to performance improvements.
- Research Article
148
- 10.1109/jiot.2018.2871445
- Apr 1, 2019
- IEEE Internet of Things Journal
The deeply penetrated WiFi signals not only provide fundamental communications for the massive Internet of Things devices but also enable cognitive sensing ability in many other applications, such as human activity recognition. State-of-the-art WiFi-based device-free systems leverage the correlations between signal changes and body movements for human activity recognition. They have demonstrated reasonably good recognition results with a properly placed transceiver pair, or, in other words, when the human body is within a certain sweet zone. Unfortunately, the sweet zone is not ubiquitous. When the person moves out of the area and enters a dead zone, or even just the orientation changes, the recognition accuracy can quickly decay. In this paper, we closely examine such spatial diversity in WiFi-based human activity recognition. We identify the dead zones and their key influential factors, and accordingly present WiSDAR, a WiFi-based spatial diversity-aware device-free activity recognition system. WiSDAR overshadows the dead zones yet with only one physical WiFi sender and receiver. The key innovation is extending the multiple antennas of modern WiFi devices to construct multiple separated antenna pairs for activity observing. Profiling activity features from multiple spatial dimensions can be more complicated and offer much richer information for further recognition. To this end, we propose a deep learning-based framework that integrates the hidden features from both temporal and spatial dimensions, achieving highly accurate and reliable recognition results. WiSDAR is fully compatible with commercial off-the-shelf WiFi devices, and we have implemented it on the commonly available Intel WiFi 5300 cards. Our real-world experiments demonstrate that it recognizes human activities with a stable accuracy of around 96%.
- Research Article
- 10.1109/jiot.2025.3634412
- Jan 1, 2025
- IEEE Internet of Things Journal
Human Activity Recognition (HAR) has emerged as a critical component in intent-aware, AI-driven Internet of Things (IoT) Communication systems, enabling context-aware responses in smart environments. Recently, WiFi-based HAR has gained significant attention due to its non-intrusive nature, low deployment cost, and ability to preserve privacy. However, they face a major challenge across domains. To address this limitation, we propose a novel cross-domain HAR framework (called TransHAR) by introducing a lightweight and efficient transformer model. On one hand, we design a feature representation block that processes the Wi-Fi channel frequency response (CFR) phase data to estimate Doppler shifts, capturing motion-related dynamics while remaining invariant to static, environment-specific structures, enhancing generalization across domains. On the other hand, we propose a lightweight Transformer architecture, termed ResDyTFormer, which minimizes reliance on normalization layers by incorporating a novel Residual Dynamic Tanh function. This function dynamically learns to balance between traditional normalization and the Dynamic Tanh operation, thereby maintaining training stability and avoiding gradient vanishing issues often encountered when using Dynamic Tanh alone. Extensive experiments on two benchmark datasets demonstrate that the proposed TransHAR framework achieves state-of-the-art performance in both in-domain and cross-domain HAR tasks with only 0.17M parameters. On the SHARP dataset, it attains an impressive 99.04% F1 score and 98.94% accuracy. On the 3DO dataset, it achieves 86.10% accuracy and 84.95% F1 score. These results highlight the potential of TransHAR as an efficient and scalable framework for real-world WiFi-based human activity sensing.
- Research Article
- 10.54254/2755-2721/2025.po25263
- Jul 20, 2025
- Applied and Computational Engineering
WiFi-based Human Activity Recognition (HAR) enables privacy-preserving, device-free motion detection using Channel State Information (CSI) from commodity devices. However, CSI's low resolution and noise complicate spatiotemporal feature extraction. We propose DSA-Net, a Dual-Path Spatial-Temporal Attention Network tailored to CSI-based HAR. It combines a slow-fast temporal pathway with cross-spatial attention to capture fine-grained and long-range dependencies, while a Transformer-based fusion module adaptively integrates spatiotemporal features. Evaluated on the Widar3.0 dataset, DSA-Net surpasses Vision Transformer (ViT) baselines by 18.91 percentage points, achieving superior accuracy with low computational overhead. Our results demonstrate DSA-Nets potential for scalable, real-time activity recognition in IoT and smart environments.
- Research Article
10
- 10.1145/3607254
- May 11, 2024
- ACM Transactions on Sensor Networks
WiFi-based human activity recognition (HAR) plays an essential role in various applications such as security surveillance, health monitoring, and smart home. Existing HAR methods, though yielding promising performance in indoor scenarios, highly depend on a massive labeled dataset for training which is extremely difficult to acquire in practical applications. In this paper, we present an automatic data labeling and HAR system, termed AutoDLAR. Taking a semi-supervised cross-modal learning framework with a hybrid loss function as the core, AutoDLAR transfers rich visual information to automatically label WiFi signals for WiFi-based HAR. Specifically, we devise a lightweight and multi-view WiFi sensing model with a parallel feature embedding method to accurately identify activities and accelerate recognition speed. Then, we exploit the video data to fine-tune a well-established visual HAR model, generating effective pseudo-labels for guiding the WiFi model’s training. We also build a synchronized Video-WiFi dataset with seven types of human activities under different scenarios to enable training and validating the semi-supervised HAR system. Extensive experiments on our collected activity dataset and the emotion recognition benchmark demonstrate that AutoDLAR attains an average accuracy of over 95.89% without manual labeling and only spends the inference time of 3.35 ms, outperforming the state-of-the-art (SOTA) methods.
- Research Article
6
- 10.1609/aaai.v39i13.33565
- Apr 11, 2025
- Proceedings of the AAAI Conference on Artificial Intelligence
WiFi-based human activity recognition (HAR) holds significant application potential across various fields. To handle dynamic environments where new activities are continuously introduced, WiFi-based HAR systems must adapt by learning new concepts without forgetting previously learned ones. Furthermore, retaining knowledge from old activities by storing historical exemplar is impractical for WiFi-based HAR due to privacy concerns and limited storage capacity of edge devices. In this work, we propose ConSense, a lightweight and fast-adapted exemplar-free class incremental learning framework for WiFi-based HAR. The framework leverages the transformer architecture and involves dynamic model expansion and selective retraining to preserve previously learned knowledge while integrating new information. Specifically, during incremental sessions, small-scale trainable parameters that are trained specifically on the data of each task are added in the multi-head self-attention layer. In addition, a selective retraining strategy that dynamically adjusts the weights in multilayer perceptron based on the performance stability of neurons across tasks is used. Rather than training the entire model, the proposed strategies of dynamic model expansion and selective retraining reduce the overall computational load while balancing stability on previous tasks and plasticity on new tasks. Evaluation results on three public WiFi datasets demonstrate that ConSense not only outperforms several competitive approaches but also requires fewer parameters, highlighting its practical utility in class-incremental scenarios for HAR.
- Book Chapter
11
- 10.1007/978-3-031-26438-2_10
- Jan 1, 2023
Recently, significant efforts have been made to explore human activity recognition (HAR) techniques that use information gathered by existing indoor wireless infrastructures through WiFi signals without demanding the monitored subject to carry a dedicated device. The key intuition is that different activities introduce different multi-paths in WiFi signals and generate different patterns in the time series of channel state information (CSI). In this paper, we propose and evaluate a full pipeline for a CSI-based human activity recognition framework for 12 activities in three different spatial environments using two deep learning models: ABiLSTM and CNN-ABiLSTM. Evaluation experiments have demonstrated that the proposed models outperform state-of-the-art models. Also, the experiments show that the proposed models can be applied to other environments with different configurations, albeit with some caveats. The proposed ABiLSTM model achieves an overall accuracy of 94.03%, 91.96%, and 92.59% across the 3 target environments. While the proposed CNN-ABiLSTM model reaches an accuracy of 98.54%, 94.25% and 95.09% across those same environments.
- Research Article
17
- 10.3390/info14070404
- Jul 14, 2023
- Information
Human Activity Recognition (HAR) has been a popular area of research in the Internet of Things (IoT) and Human–Computer Interaction (HCI) over the past decade. The objective of this field is to detect human activities through numeric or visual representations, and its applications include smart homes and buildings, action prediction, crowd counting, patient rehabilitation, and elderly monitoring. Traditionally, HAR has been performed through vision-based, sensor-based, or radar-based approaches. However, vision-based and sensor-based methods can be intrusive and raise privacy concerns, while radar-based methods require special hardware, making them more expensive. WiFi-based HAR is a cost-effective alternative, where WiFi access points serve as transmitters and users’ smartphones serve as receivers. The HAR in this method is mainly performed using two wireless-channel metrics: Received Signal Strength Indicator (RSSI) and Channel State Information (CSI). CSI provides more stable and comprehensive information about the channel compared to RSSI. In this research, we used a convolutional neural network (CNN) as a classifier and applied edge-detection techniques as a preprocessing phase to improve the quality of activity detection. We used CSI data converted into RGB images and tested our methodology on three available CSI datasets. The results showed that the proposed method achieved better accuracy and faster training times than the simple RGB-represented data. In order to justify the effectiveness of our approach, we repeated the experiment by applying raw CSI data to long short-term memory (LSTM) and Bidirectional LSTM classifiers.
- Conference Article
15
- 10.1109/healthcom.2017.8210783
- Oct 1, 2017
WiFi-based Human activity recognition has attracted attention in the human-computer interaction, smart homes, and security monitoring fields. We first construct a WiFi-based activity dataset, namely WiAR, to provide a benchmark for existing works. Then, we leverage the moving variance of CSI to detect the start and end of activity. Moreover, we present K-means-based subcarrier selection mechanism according to subcarrier's sensitivity on human activity to enhance the robustness of human activity recognition. Finally, we leverage several classification algorithms to evaluate the performance of WiAR. Our results show that WiAR satisfies primary demand and achieves an average accuracy of greater than 93% using SVM, 80% using kNN, Random forest, and Decision tree.
- Research Article
205
- 10.1609/aaai.v35i1.16103
- May 18, 2021
- Proceedings of the AAAI Conference on Artificial Intelligence
Recognition of human activities is an important task due to its far-reaching applications such as healthcare system, context-aware applications, and security monitoring. Recently, WiFi based human activity recognition (HAR) is becoming ubiquitous due to its non-invasiveness. Existing WiFi-based HAR methods regard WiFi signals as a temporal sequence of channel state information (CSI), and employ deep sequential models (e.g., RNN, LSTM) to automatically capture channel-over-time features. Although being remarkably effective, they suffer from two major drawbacks. Firstly, the granularity of a single temporal point is blindly elementary for representing meaningful CSI patterns. Secondly, the time-over-channel features are also important, and could be a natural data augmentation. To address the drawbacks, we propose a novel Two-stream Convolution Augmented Human Activity Transformer (THAT) model. Our model proposes to utilize a two-stream structure to capture both time-over-channel and channel-over-time features, and use the multi-scale convolution augmented transformer to capture range-based patterns. Extensive experiments on four real experiment datasets demonstrate that our model outperforms state-of-the-art models in terms of both effectiveness and efficiency.
- Conference Article
4
- 10.1109/iccc56324.2022.10065954
- Dec 9, 2022
Human activity recognition (HAR) plays an important role in many applications such as smart homes, healthcare services, and security monitoring. Recently, WiFi-based human activity recognition (HAR) is becoming increasingly popular due to its non-invasiveness. Most existing HAR works only use classification methods for activity recognition, without focusing on the start time and end time of actions. In this paper, we propose to use a detection method that predicts both the type of activity as well as its start and end times. For detection tasks, both global information and local information are essential for modeling and identifying various types of activities. Therefore, we propose a multi-scale convolution Transformer that is able to exploit local features of WiFi data more effectively using CNNs, while global features are captured with Transformer. In our experiments, the proposed model shows outstanding performance in indoor environment, with a weak micro F1 score of 98.37% and a strong micro F1 score of 92.81%.
- Conference Article
8
- 10.1109/icpads53394.2021.00006
- Dec 1, 2021
WiFi-based human activity recognition has been widely used in many fields such as health diagnosis, intrusion detection and smart home. Most existing recognition methods can achieve a satisfying accuracy only in one domain, but low accuracy occurs when models are trained in source domain but are used in target domain. Meanwhile, considering finetuning network directly is impossible or easy to overfit with limited labeled target data, transfer learning based methods with domain adaptive layers are proposed to solve above problems but just aligning marginal distribution, which may lose massive fine-grained features. Based on this, we present an end-to-end deep subdomain adaptive network based activities recognition (DSANAR) using Channel State Information (CSI) that aligns marginal and matches conditional distribution simultaneously for more fine-grained features in each category of relevant subdomains based on a local maximum mean discrepancy (LMMD). Besides, by using a joint cross-entropy and an adaptive loss as training loss, DSANAR outperforms other state-of-art methods on an autonomous dataset with average 95.6% cross-domain accuracy.
- Research Article
19
- 10.1109/jbhi.2022.3219640
- Jan 1, 2023
- IEEE Journal of Biomedical and Health Informatics
WiFi-based human activity recognition (HAR) has been extensively studied due to its far-reaching applications in health domains, including elderly monitoring, exercise supervision and rehabilitation monitoring, etc. Although existing supervised deep learning techniques have achieved remarkable performances for these tasks, they are however data-hungry and hence are notoriously difficult due to the privacy and incomprehensibility of WiFi-based HAR data. Existing contrastive learning models, mainly designed for computer vision, cannot guarantee their performance on channel state information (CSI) data. To this end, we propose a new dual-stream contrastive learning model that can process and learn the raw WiFi CSI data in a self-supervised manner. More specifically, our proposed method, coined as DualConFi, takes raw WiFI CSI data as input and incorporates channel and temporal streams to learn highly-discriminative spatiotemporal features under a mutual information constraint using unlabeled data. We exhibit the effectiveness of our model on three publicly available CSI data sets in various experiment settings, including linear evaluation, semi-supervised, and transfer learning. We show that DualConFi is able to perform favourably against challenging baselines in each setting. Moreover, by studying the effects of different transform functions on CSI data, we finally verify the effectiveness of highly-discriminative features.
- Conference Article
31
- 10.1109/wcnc49053.2021.9417590
- Mar 29, 2021
WiFi-based human activity recognition technology has attracted widespread attention for its prominent application value and theoretical significance. Existing approaches have made great achievements in the same domain sensing, which means the activity samples applied for training the model have a similar distribution with the testing data. However, in practical application, we hope that the same activity of different people with various states and habits in different locations can be accurately recognized and produce the same reaction. Therefore, cross-domain sensing technology is pretty important. Some studies explore the location-independent and environment-independent methods, but few attempts consider the influence of the initial states of the users, such as standing and sitting, which actually have very different effects on the transmission of the wireless signal. This paper presents a human activity recognition method adapted to different initial states. Meanwhile, we solve the accompanying issue of the small sample size sensing, obviating the need for the cumbersome wok resulting from the massive data collection. We take advantage of the idea of metric learning and few-shot learning to realize cross-domain sensing with very few samples. The experiments demonstrate the feasibility and excellent performance of our method, which could recognize human activities with different initial states as the training data.