Articles published on Action Recognition Method
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
1007 Search results
Sort by Recency
- New
- Research Article
- 10.1002/itl2.70292
- May 15, 2026
- Internet Technology Letters
- Xing Chen + 1 more
ABSTRACT Population aging has increased the demand for intelligent smart home systems for elderly care. Although Internet of Things (IoT) sensors enable unobtrusive residential monitoring, existing human activity recognition methods often rely on centralized processing and have limited ability to model heterogeneous sensing sources and long‐range temporal dependencies. To address this issue, this paper proposes a Distributed Adaptive Multi‐Agent Transformer (DAMAT) framework for smart home activity recognition. DAMAT models heterogeneous sensing streams as collaborative agents, captures long‐range temporal and cross‐agent contextual dependencies through transformer‐based interaction, and employs adaptive coordination attention to regulate agent contributions under different activity contexts. Experiments on the CASAS and UCI HAR datasets show that DAMAT consistently outperforms representative deep learning baselines. In particular, the CASAS results directly support the effectiveness of the proposed framework in distributed smart home sensing environments, while the UCI HAR results provide auxiliary evidence that the temporal modeling and adaptive coordination mechanisms remain effective on wearable inertial sensor data.
- Research Article
- 10.1016/j.neucom.2026.133172
- May 1, 2026
- Neurocomputing
- Qi Zhang + 6 more
An unsupervised open-set recognition method for user-independent human activity recognition
- Research Article
- 10.1016/j.isci.2026.115252
- May 1, 2026
- iScience
- Qu Wang + 6 more
Pedestrian navigation activity recognition method based on two-stream transformer and contrastive learning.
- Research Article
- 10.1142/s021951942650048x
- Apr 24, 2026
- Journal of Mechanics in Medicine and Biology
- Chenxi Lu + 1 more
To address the problem of low recognition rate caused by the difficulty in capturing highspeed and subtle movements in table tennis, this work proposes a motion recognition method based on multimodal data and an optimized Spatial-Temporal Graph Convolutional Network (ST-GCN). The model introduces a Multi-Level Graph Convolutional Network (ML-GCN) architecture and constructs cross-level feature extraction channels, which effectively capture the spatiotemporal correlations between local subtle movements and global trajectories. The built-in hybrid attention mechanism realizes precise focusing on key skeletal nodes and core motion frames through adaptive weight assignment. Combined with the multimodal fusion strategy of visual signals and inertial sensor data, it significantly enhances the robustness of the model in scenarios with line-of-sight occlusion and motion blur. Test results based on a self-built multimodal table tennis dataset show that this method achieves an accuracy of 88.2%, a recall rate of 89.5% and an F1-score of 88.3%. This performance is significantly superior to the original ST-GCN and existing mainstream motion recognition algorithms, which confirms the core role of each optimization module in improving feature representation capability and computational efficiency. The study provides an efficient technical solution for the intelligent analysis of complex sports movements.
- Research Article
- 10.1145/3811026
- Apr 21, 2026
- ACM Transactions on Human-Robot Interaction
- Yongqiang Jiang + 2 more
To investigate the use of different pointing forms in service scenarios, we collected the ShopPoint 1 dataset, a skeleton-based dataset of pointing gestures from customer-shopkeeper interactions in a camera shop scenario. 13 participants took part in the data collection, including 3 shopkeepers with real-world customer service experience and 10 customers. We recorded 61 one-to-one role-played interactions. Coders annotated pointing gestures from videos of these interactions, emphasizing pointing arm forms (straight-arm, bent-arm, and hand-only pointing) and hand forms (index-finger and open-hand pointing). This annotation process resulted in 2959 pointing gestures. We conducted statistical analysis on the annotated data. The analysis revealed that bent-arm pointing was used more frequently than other arm forms. Straight-arm pointing was used more for far targets than for close targets, and hand-only was used more for close targets. Shopkeepers used bent-arm pointing more frequently than customers when referring to far targets. To evaluate the recognition of these pointing gestures, we tested several existing Skeleton-based Action Recognition (SAR) methods on the dataset. The highest accuracy was achieved at 72.51% by using transfer learning (i.e., pre-training and fine-tuning). This evaluation indicates that though transfer learning aids performance, recognizing pointing with diverse forms remains challenging.
- Research Article
- 10.1016/j.displa.2025.103298
- Apr 1, 2026
- Displays
- Zehui Zhang + 6 more
3D human pose estimation-based action recognition method for complex industrial scenarios
- Research Article
- 10.54254/2977-3903/2026.32415
- Mar 26, 2026
- Advances in Engineering Innovation
- Zheng Han
To address the challenges of large-scale variations in human targets, the loss of spatial details, and the inconsistency between prediction confidence and localization quality in complex scenarios, this study proposes a high-quality localization-aware action recognition method based on YOLOv11d. An SPDConv downsampling structure is introduced into the backbone network and the feature fusion stage to enhance the representation capability of small-scale target features. In addition, a localization quality estimation branch is incorporated into the detection head to explicitly model the Intersection over Union (IoU) of bounding boxes, and the confidence score is reweighted by combining the estimated localization quality with class probability. Experimental results demonstrate that the proposed method achieves an mAP@50 of 96.0% and an mAP@50–95 of 72.3%, representing improvements of 0.3% and 2.8%, respectively, compared with YOLOv11.
- Research Article
- 10.2493/jjspe.92.173
- Feb 5, 2026
- Journal of the Japan Society for Precision Engineering
- Shota Matsumiya
We are developing a fine-grained action recognition system to detect human errors with the aim of achieving Quality Assurance. Recently, there have been numerous studies utilizing multiple sensors and viewpoints to enhance the accuracy of action recognition. However, in manufacturing sites, available locations for camera included sensor installation are often limited, making it difficult to attach cameras to fixed positions. In this study, we propose a method for view-invariant action recognition using 3D skeleton data. By converting the 3D skeleton data into a body-coordinate system and utilizing it for training, we have developed and evaluated a model capable of recognizing actions from unknown viewpoints. Furthermore, challenges that need to be addressed before implementing the proposed method in practical manufacturing environments are discussed.
- Research Article
- 10.1016/j.engappai.2025.113522
- Feb 1, 2026
- Engineering Applications of Artificial Intelligence
- Yanhong Jie + 4 more
A novel helicopter flight action recognition method based on flight parameter data processing
- Research Article
1
- 10.1038/s41598-025-34401-9
- Jan 2, 2026
- Scientific Reports
- Jinzhu Zeng + 2 more
Human Activity Recognition (HAR) plays a significant role in the field of health monitoring. By accurately identifying common activities such as walking, go up/down stairs, sitting, standing, and lying down, continuous tracking, analysis of an individual’s behavioral state can be achieved. This is of great importance for health monitoring and intelligent healthcare. However, due to the noise in data collected by wearable sensors and the variability in data distribution, existing HAR methods still face limitations in accuracy and generalization ability, making it difficult to maintain stable and reliable early-warning performance across diverse motion scenarios and individual differences. To address these challenges, a human activity recognition method ASSAFormer is proposed in this paper. ASSAFormer integrates mode decomposition, heuristic optimization algorithms, and an improved Transformer for health monitoring. During data preprocessing, variational mode decomposition (VMD) is employed to filter noise from sensor data sequences, while the Whale Optimization Algorithm (WOA) optimizes the number of decomposition modes and the penalty factor, thereby mitigating the parameter sensitivity issue in mode decomposition. In terms of network architecture design, Adaptive Sparse Self-Attention (ASSA) and Contrastive Normalization (ContraNorm) are introduced into the vanilla Transformer. Firstly, the self-attention mechanism in Transformer is prone to introducing low correlation interference information and leading to overfitting. Therefore, an Adaptive Sparse Self-Attention (ASSA) mechanism is proposed. The Sparse Self-Attention (SSA) branch of ASSA filters query-key matching scores, allowing only highly relevant information to pass through, thereby reducing noise interference. Meanwhile, the Dense Self-Attention (DSA) of ASSA branch retains weakly relevant yet useful information that might be overlooked due to excessive sparsification. Secondly, Contrastive Normalization (ContraNorm) is introduced to alleviate dimensional collapse, enabling better implicit dispersion of representations in the feature space. The comparative experiments demonstrate that the proposed method achieves the best performance on both the UCI dataset and URFD dataset, respectively. And the ablation studies further validate the effectiveness of the improved modules.
- Research Article
- 10.1109/tnsm.2026.3671357
- Jan 1, 2026
- IEEE Transactions on Network and Service Management
- Xing Li + 4 more
In next-generation communication networks and Industry 5.0 based applications, ensuring robust security and reliability in human-computer interaction (HCI) constitutes a fundamental prerequisite for safety-critical AI machine systems. Point cloud sequence-based human action recognition demonstrates intrinsic advantages in privacy-preserving HCI, leveraging its non-intrusive sensing modality to mitigate data vulnerability while maintaining high-precision action interpretation in industrial environments. Existing spatio-temporal encoding methods for point cloud sequence-based action recognition suffer from two fundamental limitations: (1) rigid neighborhood constraints impair multi-scale feature extraction for heterogeneous body parts, and (2) independent spatial-temporal decomposition introduces motion representation distortion. We propose a Meta-motion Decoupling Point Cloud Sequence Network (MD-PCSN) that addresses these challenges through: (1) logarithmic spatio-temporal point convolution for hierarchical meta-motion construction at variable granularities, and (2) a novel Gated-KANsformer architecture with differential motion encoding to explicitly model both short-term displacements and long-term spatio-temporal dependencies. The proposed meta-motion decoupling mechanism significantly enhances robustness against sensor perturbations, making the framework particularly suitable for security-critical applications. Extensive experiments on three benchmark datasets demonstrate MD-PCSN’s superior performance. It outperforms classic PST-Transformer by 1.5% on MSR Action3D and 4.14% on UTD-MHAD. Under the NTU RGB+D 60, it achieves 2.9% cross-view gain over the latest PointActionCLIP.
- Research Article
- 10.1109/lsens.2026.3669010
- Jan 1, 2026
- IEEE Sensors Letters
- Shufeng Gong + 5 more
To enhance road traffic safety, accurate recognition of continuous driving behaviors is crucial. Addressing the limitation of existing methods that predominantly focus on isolated action recognition, this paper proposes a continuous driving action segmentation and recognition method based on millimeter-wave radar. Firstly, micro-Doppler map and energy distribution maps are generated using techniques such as FFT and OS-CFAR. Building on this, a threshold segmentation algorithm based on energy distribution is proposed, which achieves precise segmentation of continuous actions with variable durations by automatically detecting their actual start and end points, thereby obtaining micro-Doppler map for each driving action. To handle the time-varying characteristics of the segmented actions, a hybrid neural network named IRT is designed and implemented. This network utilizes InceptionResNetV2 as its backbone, integrates Transformer modules to capture long-range dependencies, and incorporates inverted residual blocks to optimize feature extraction efficiency. Experimental results demonstrate that the proposed segmentation algorithm can effectively segment and recognize multiple sets of continuous driving actions, achieving an average recognition accuracy of 97.30%, demonstrating its effectiveness and reliability.
- Research Article
- 10.1049/icp.2025.4674
- Jan 1, 2026
- IET Conference Proceedings
- Zijian Chen
To address the need for refined analysis of action phase recognition and performance prediction in long jump, this paper proposes a multi-task learning framework that integrates an Inflated 3D Convolutional Network (I3D) and self-attention mechanism-based Transformer (Transformer) model. This approach first extracts 3D convolutional features from raw long jump videos using an I3D network to capture the spatiotemporal linkage characteristics of key phases. The Transformer model is then introduced to construct long-term dependencies, improving the model's understanding of movement transitions. Furthermore, a unified multi-task output module is designed to achieve collaborative prediction of action phase classification and long jump performance regression. Experimental results demonstrate that the proposed model achieves an accuracy of 92.3% in action recognition and a mean error (MAE) of less than 0.13 meters in performance prediction, significantly outperforming several competing methods. Further attention visualization and residual analysis validate the model's strong focus on key action nodes and its cross-temporal modeling capabilities. This approach provides an effective technical approach for athletic performance assessment and intelligent motion analysis.
- Research Article
- 10.1109/jbhi.2026.3651261
- Jan 1, 2026
- IEEE journal of biomedical and health informatics
- Xinhua Fan + 3 more
As the aging population grows and more elderly individuals live independently, the demand for reliable, unobtrusive home health monitoring becomes increasingly important. Existing in-home health monitoring systems often face limitations such as privacy concerns, dependence on unreliable wearable devices, degraded accuracy in complex environments, and lack of continuous monitoring capability. To address these challenges, we propose a long-term home health monitoring system that primarily relies on audio sensing, supplemented by other noninvasive modalities. Our approach is able to accurately detect and recognize overlapping acoustic events with fine-grained temporal resolution, surpassing conventional audio-based methods for activity recognition. The system incorporates a transformer-based time-frequency fusion module and a category dynamic threshold strategy to improve detection performance under semi supervised conditions. Experiments on real-world dataset demonstrate that our method outperforms existing baselines, achieving PSDS$_{1}$, PSDS$_{2}$, and EB-F1 scores of 0.581, 0.930 and 55.1%, with improvements of 0.054, 0.019, and 2.3%, respectively. In addition, a 30 day field deployment involving 10 elderly participants confirms the robustness and practicality of the system for real-world applications. By allowing continuous passive monitoring of daily activities and abnormal acoustic events, our system has significant potentials for early detection of health risks, behavioral anomalies, and long-term wellness tracking in aging in place scenarios.
- Research Article
2
- 10.1109/tifs.2025.3650396
- Jan 1, 2026
- IEEE Transactions on Information Forensics and Security
- Jinsheng Xiao + 5 more
For providing timely warnings and preventing potential damages, it is crucial to detect anomalous actions that threaten public safety through surveillance cameras. Compared to normal actions, anomalous actions often occupy only a small portion of surveillance videos and exhibit more complex manifestations in terms of time and space. Considering that normal action recognition methods fail to highlight crucial information from small-sized patches, we propose the Spatio-temporal Key Patch Selection Network (STKPS-Net). It includes a spatially adaptive key patch selection module to select small but informative patches, and a long-short feature map spatio-temporal relation module to capture dynamic changes in anomalous actions. Additionally, a spatio-temporal refined loss is introduced to enhance fine-grained feature learning. Experimental results on the HMDB51, Kinetics, and UCF-Crime v2 datasets show that our STKPS-Net achieves state-of-the-art performance in few-shot anomalous action recognition, outperforming the most competitive methods by 1.2% on the anomalous action dataset UCF-Crime v2.
- Research Article
- 10.63313/ajet.9026
- Dec 24, 2025
- Academic Journal of Emerging Technologies
- Fan Zhu
With the popularity of wearable devices and IoT technology, sensor-based hu-man activity recognition is valuable in fields like smart healthcare. However, traditional deep learning requires large labeled datasets and struggles with new categories, users, or environments where data is scarce. To address this, we propose a Few-Shot Transfer Learning method for Human Activity Recognition (FTL-HAR). It leverages pre-trained models to transfer knowledge, enabling adaptation to new categories with minimal data. Experi-ments on public datasets (PAMAP2, OPPORTUNITY) under 1-shot and 5-shot settings show that FTL-HAR significantly outperforms traditional methods by effectively utilizing pre-trained features for rapid fine-tuning.
- Research Article
- 10.1038/s41598-025-27450-7
- Dec 12, 2025
- Scientific Reports
- Hend Khalid Alkahtani + 3 more
Human activity recognition (HAR) has numerous applications due to its widespread use of procurement tools, such as smartphones and video cameras, and its ability to capture data on human activity. HAR became a hot scientific area in the computer vision (CV) domain. It is complicated in the expansion of many substantial applications, namely video surveillance, home monitoring, security, virtual reality, and human–computer interaction. Subsequently, a wide range of activity recognition methods were developed for individuals with disabilities. HAR is identified as the technique of naming and recognizing actions using artificial intelligence (AI)-based deep learning (DL) methodologies. DL models are crucial to the activity recognition process for individuals with disabilities and older people. This paper presents an Optimised Hybrid Deep Learning Model for Human Activity Recognition Using Metaheuristic Optimisation Algorithms (OHDLM-HARMOA) model. The aim is to develop an effective HAR method that assists and improves the quality of life for people with disabilities through accurate activity monitoring. Initially, the data pre-processing stage applies Z-score normalization for converting the input data into a structured pattern. For the feature selection process, the ant colony optimization (ACO) model is employed to select the most relevant and significant features from a dataset. Furthermore, the OHDLM-HARMOA model utilizes the hybridization of a convolutional neural network and a bidirectional gated recurrent unit with attention (CNN-BiGRU-A) technique for classification. Finally, the parameter tuning process is performed using the Sine–Cosine Algorithm (SCA) technique to enhance the classification performance of the CNN-BiGRU-A model. The experimental evaluation of the OHDLM-HARMOA approach is performed under the WISDM dataset. The comparison analysis of the OHDLM-HARMOA approach demonstrated a superior accuracy value of 99.00% over existing models.
- Research Article
5
- 10.1016/j.jmsy.2025.09.007
- Dec 1, 2025
- Journal of Manufacturing Systems
- Lili Dong + 4 more
RGB video and inertial sensing fusion method for human action recognition in human-robot collaborative manufacturing
- Research Article
1
- 10.1016/j.jestch.2025.102230
- Dec 1, 2025
- Engineering Science and Technology, an International Journal
- Xiaoxu Wen + 6 more
DC-PFL: A dynamic clustering-based personalized federated learning method for human activity recognition
- Research Article
- 10.1007/s11276-025-04053-8
- Nov 26, 2025
- Wireless Networks
- Jiai He + 2 more
Privacy-preserving activity recognition method based on WiFi signals: adaptive CSI enhancement and deep reinforcement learning