Decision-enhanced Hierarchical Feature Fusion for Accurate Human Activity Recognition
Human Activity Recognition (HAR) is emerging as a critical enabler of context-aware applications in healthcare, fitness, and smart environments.In this research we present an approach that involves the Hierarchical Fusion with Decision Enhancement for Human Activity Recognition (Hi-FuseDE-HAR) framework.It contains four sequential hierarchical levels that transform raw sensor signals into a reliable HAR decision.At the input level, heterogeneous data streams are obtained from multiple wearable and ambient sensors.Applying Level 0 build the discriminative latent embedding for each sensor modality separately.Level 1 fuses across sensors in groupings and determines the value of using each modality and determines its relative contribution to the overall feature importance.Level 2 applies a Graph Cross-Modal Transformer that learns relationship between sensors groups producing a globally consistent fused representation.Level 3 provides decision enhancement through uncertainty calibration and utility aware optimization to ensure the final estimates based.Experimental results indicate that the proposed framework achieves 97.6% accuracy and 96.7% F1-score on the PAMAP2 dataset, 95.5% accuracy and 93.2% F1-score on the OPPORTUNITY dataset, and 96.5% accuracy and 95.2% F1-score on the MHEALTH dataset respectively.Notably, Hi-FuseDE-HAR retains strong performance confirming its capability to generalize across varied sensor contexts and complex activity patterns.
- Research Article
34
- 10.1016/j.eswa.2009.05.057
- May 22, 2009
- Expert Systems with Applications
A flexible sequence alignment approach on pattern mining and matching for human activity recognition
- Research Article
29
- 10.1155/2022/8383461
- Oct 5, 2022
- Security and Communication Networks
Automatic human activity recognition is one of the milestones of smart city surveillance projects. Human activity detection and recognition aim to identify the activities based on the observations that are being performed by the subject. Hence, vision-based human activity recognition systems have a wide scope in video surveillance, health care systems, and human-computer interaction. Currently, the world is moving towards a smart and safe city concept. Automatic human activity recognition is the major challenge of smart city surveillance. The proposed research work employed fine-tuned YOLO-v4 for activity detection, whereas for classification purposes, 3D-CNN has been implemented. Besides the classification, the presented research model also leverages human-object interaction with the help of intersection over union (IOU). An Internet of Things (IoT) based architecture is implemented to take efficient and real-time decisions. The dataset of exploit classes has been taken from the UCF-Crime dataset for activity recognition. At the same time, the dataset extracted from MS-COCO for suspicious object detection is involved in human-object interaction. This research is also applied to human activity detection and recognition in the university premises for real-time suspicious activity detection and automatic alerts. The experiments have exhibited that the proposed multimodal approach achieves remarkable activity detection and recognition accuracy.
- Conference Article
20
- 10.1109/sas.2018.8336752
- Mar 1, 2018
Wearable sensors are widely utilized in human activity monitoring and recognition systems. Not only do these sensors come in different form factors but the software that comes bundled with them also varies from device to device and is constantly evolving. Also, multiple types of sensors on these devices are used to recognize human activity. Owing to the flexible form factor, these devices can also be mounted on a plethora of different positions on the human body. With all the aforementioned variables, it becomes imperative that the quality of data provided by wearable sensors needs to be evaluated. This paper describes an empirical study resulting in evaluating the accuracy of human activity recognition by wearable sensors based on the type of sensor, the physical mounting position of the sensor on the human body, their type of activity being monitored and the type of device being used. The paper further delves into assessing the results of this study. It provides guidelines for designing better wearable sensor systems for human activity recognition.
- Conference Article
4
- 10.1109/icaiic.2019.8669069
- Feb 1, 2019
Human action recognition has great research and application value in intelligent video surveillance, human-computer interaction and other communication fields. In order to improve the accuracy of human action recognition for video understanding, the extraction of human motion features and attentional fusion methods are studied. This paper has two main contributions. Firstly, based on the essence of optical flow validity, a novel dynamic feature expression method called Human-Object Contour(HOC) is presented, which combines object understanding and contextual information. Secondly, referring to the principle of Stacking in ensemble learning, we propose Attentional Multi-modal Fusion Network(AMFN). According to the characteristics of the video, attention is paid to selecting different modalities rather than simple averaging with fixed weight. The experiment shows that HOC is effectively complementary to the static appearance feature, and the accuracy of action recognition with our fusion network improves effectively. Our approach obtains the state-of-the-art performance on the datasets of HMDB51 (72.2%) and UCF101 (96.0%).
- Research Article
- 10.12783/dtcse/iece2018/26598
- Dec 7, 2018
- DEStech Transactions on Computer Science and Engineering
Traditional pattern recognition algorithms for human activity recognition observe data from all available sensors and obtain feature vectors of a fixed dimension. Pattern classification is performed in the vector space of the same dimension. As a result, the computational cost increases as the number of sensors increases for higher accuracy, and the algorithm fails if some of the sensors are offline. As such, data–driven activity recognition suffers from the disadvantage of scalability and reusability. So there is a great need for a kind of recognition framework with high flexibility and low obtrusiveness especially for older people, not only can monitor human acceleration data but also generate human’s environment even object usage information. Prior information such as people’s routine which is relatively stable in his/her life can be used as knowledge to design a powerful human-computer collaboration system for human activity recognition. In this framework, we make a guess from the users’ routine knowledge, and then use the relevant sensor data for every individual to validate the result, the result could be positive or changed according to the decision criteria. The experimental results showed that that the accuracy of human activity recognition can be up to 90%.Especially, when some sensors are offline, the model can still achieve good results.
- Research Article
34
- 10.1007/s41870-021-00719-6
- Jan 1, 2021
- International Journal of Information Technology
Identification of human physical activities is an active research area since long due to its application in personalized health and fitness monitoring. The performance accuracy of human activity recognition (HAR) models mainly depend on the features which are extracted from domain knowledge. The features are the input of the classification algorithm to efficiently identify human physical activities. Manually extracted features (handcrafted) need expert domain knowledge. Thus these features have significant importance to identify different human activities. Recently deep learning methods are utilized to extract the features automatically from raw sensory data for HAR models. However, state-of-the-art HAR literature established that the importance of handcrafted features can’t be ignored as it is extracted from expert domain knowledge. Thus, in this paper we use the fusion of both the handcrafted features and automatically extracted features using deep learning (DL) for HAR model to enhance the performance of HAR. Extensive experimental results demonstrate that our proposed feature fusion based HAR model gives higher accuracy compared with state-of-the-art HAR literature for both the self collected and public dataset.
- Research Article
- 10.1038/s41598-025-19496-4
- Oct 13, 2025
- Scientific Reports
The challenge of providing independent living for elderly and disabled individuals is a critical societal concern. Accurate human activity recognition (HAR) is core to allow the development of context-aware applications that involve the identification and understanding of human behaviour, for example, monitoring elderly or disabled people who live alone. HAR performed using ambient sensors, such as cameras or wearable devices, has gained prominence due to its wide-ranging applications in healthcare, surveillance, smart environments, and security. However, choosing an appropriate AI model for precisely interpreting intrinsic human activities remains a key challenge in the field. And, it is beneficial for people with disabilities or the elderly to live independently. Currently, the methods of artificial intelligence (AI) for activity recognition, an optimal application area, and the form of data acquisition devices make the selections more complex. Different researchers applied deep learning (DL) techniques in HAR. At present, DL has achieved remarkable results in developing high-level ideas from composite data in various fields like HAR. In this study, a Hybrid Deep Learning with an Attention Mechanism for Automatic Human Activity Recognition Using Swin Transformer (HDLAM-AHARST) model is proposed. The aim is to design an intelligent HAR system to assist individuals with disabilities by enabling accurate and real-time monitoring for improved quality of life. Initially, the Gabor filter (GF) method is utilized in the image pre-processing step to eliminate noise and enhance image quality. Furthermore, the Swin Transformer (SwinT) method is utilized for the feature extraction process to identify and transform relevant information from data. Moreover, the hybridization of a convolutional neural network and a long short-term memory with an attention mechanism (C-LSTM-A) is employed for the HAR classification process. Finally, the hyperparameter selection for the C-LSTM-A model is performed by using the Lyrebird Optimisation Algorithm (LOA) method. The experimentation of the HDLAM-AHARST technique is performed under the HAR image dataset. The comparison study of the HDLAM-AHARST technique illustrated an accuracy rate of 98.91% over existing methods.
- Conference Article
13
- 10.1109/icpai51961.2020.00039
- Dec 1, 2020
Human action recognition plays an important role in video surveillance, human-computer interaction, video understanding, and virtual reality. Different from two-dimensional object recognition, human action recognition is a dynamic object recognition with a time series relationship, and it faces many challenges from complex environments, such as color shift, light and shadow changes, and sampling angles. In order to improve the accuracy of human action recognition, many studies have proposed skeleton-based action recognition methods that are not affected by the background, but the current framework does not have much discussion on the integration of the time dimension.In this paper, we propose a novel SlowFast-GCN framework which combines the advantages of ST-GCN and SlowFastNet with dynamic human skeleton to improve the accuracy of human action recognition. The proposed framework uses two streams, one stream captures fine-grained motion changes, and the other stream captures static semantics. Through these two streams, we can merge the human skeleton features from two different time dimensions. Experimental results show that the proposed framework outperforms to state-of-the-art approaches on the NTU-RGBD dataset.
- Research Article
9
- 10.1002/cpe.6137
- Mar 9, 2021
- Concurrency and Computation: Practice and Experience
SummaryAs an important technology in computer vision, video‐based human action recognition has a great commercial value, which has attracted extensive attention in the field of computer vision and pattern recognition in both academia and industry. To date, there are a wide variety of applications of human action recognition, such as surveillance, robotics, health care, video searching, and human–computer interaction. However, there are many challenges involved in human action recognition in videos, such as cluttered backgrounds, occlusions, viewpoint variation, execution rate, and camera motion. However, data redundancy and single feature were largely limited the accuracy of human action recognition. In this article, adopting the key frame extraction and multifeature fusion techniques, a novel action recognition method was proposed, which can improve the recognition accuracy. The main works are as follows: 1) in order to solve the problem of data redundancy, a key frame extraction method based on node contribution weighting is proposed to extract video key frames; 2) different kinds of information flows are extracted from the obtained key frame sequences, and different convolutional neural networks are used to obtain corresponding classification results and merge, so as to better complement the information in different flows. Lastly, the experimental results show that our method improves the accuracy of action recognition.
- Research Article
48
- 10.1016/j.neucom.2020.04.150
- Nov 24, 2020
- Neurocomputing
A fast human action recognition network based on spatio-temporal features
- Research Article
9
- 10.1007/s42486-020-00042-2
- Oct 6, 2020
- CCF Transactions on Pervasive Computing and Interaction
Human activities recognition (HAR) in wearable devices is a promising technology in pervasive computing. However, the traditional method often regards human activity recognition as a single label recognition problem, ignoring the association between the current activity mode, personal motion mode and sensor wearing position. This paper proposes a multi-task human activity recognition multi-task learning framework based on supervised learning, which not only considers the activity, but also considers the identity of the wearer, gender and the position of the sensor on the body. We extracted the time-domain and frequency-domain features of the original data, and classified the data through a multi-task learning framework composed of a fully connected network and a convolutional neural network. We employ a public data set composed of 15 experimenters, 8 movements and 7 body positions. Only 30 $$\%$$ of the data is used to train the model, which can achieve high precision. The experimental results show that the classification accuracy of activity recognition can reach 90.8 $$\%$$ , body position recognition can reach 98.7 $$\%$$ , wearer identity recognition can reach 97.5 $$\%$$ , gender recognition can reach 98.7 $$\%$$ . We call the model trained with 30 $$\%$$ data as a pre-trained model, and then put personal data into the pre-trained model for fine-tune. Using a pre-trained model for fine-tune on personal data can achieve up to 95.6 $$\%$$ activity recognition accuracy.
- Research Article
2
- 10.47992/ijmts.2581.6012.0318
- Nov 23, 2023
- International Journal of Management, Technology, and Social Sciences
Purpose: The objective of this research article is to methodically combine the existing literature on Human Activity Recognition (HAR) and provide an understanding of the present state of the HAR literature. Additionally, the article aims to suggest an appropriate HAR system that can be used for detecting real-time activities such as suspicious behavior, surveillance, and healthcare. Objective: This review study intends to delve into the current state of human activity detection and recognition methods, while also pointing towards promising avenues for further research and development in the field, particularly with regards to complex and multi-task human activity recognition across different domains. Design/Methodology/Approach: A systematic literature review methodology was adopted by collecting and analyzing the required literature available from international and national journals, conferences, databases and other resources searched through the Google Scholar and other search engines. Findings/Result: The systematic review of literature uncovered the various approaches of Human activity detection and recognition. Even though the prevailing literature reports the investigations of several aspects of Human activity detection and recognition, there is still room for exploring the role of this technology in various domains to enhance its robustness in detecting and recognizing of multiple human actions from preloaded CCTV cameras, which can aid in detecting abnormal and suspicious activities and ultimately reduce aberrant human actions in society. Originality/Value: This paper follows a systematic approach to examine the factors that impact the detection and recognition of Human activity and suggests a concept map. The study undertaken supplements the expanding literature on knowledge sharing highlighting its significance. Paper Type: Review Paper.
- Research Article
10
- 10.1155/2021/2311594
- Jan 1, 2021
- Journal of Sensors
The purpose is to study the interactive teaching mode of human action recognition technology in music and dance teaching under computer vision. The human action detection and recognition system based on a three‐dimensional (3D) convolutional neural network (CNN) is established. Then, a human action recognition model based on the dual channel is proposed on the basis of CNN, and the visual attention mechanism using the interframe differential channel is introduced into the model. Through experiments, the performance of the system in the process of human dance image recognition based on the Kungliga Tekniska Högskolan (KTH) dataset is verified. The results show that the dual‐channel 3D CNN human action recognition system can achieve high accuracy in the first few rounds of training after the frame difference channel is added, the error can be reduced quickly, and the convergence can start quickly; the recognition accuracy of the system on KTH dataset is 96.6%, which is higher than that of other methods; for 3 × 3 × 3 basic convolution kernel, the best performance of the classification network can be obtained by pushing forward 0.0091 seconds in the calculation. Thereby, the dual‐channel 3D CNN recognition system has good human action recognition accuracy in the dance interactive teaching mode of music teaching.
- Book Chapter
1
- 10.1007/978-3-030-30645-8_33
- Jan 1, 2019
Recognition of accurate human activities is a challenging research problem in video surveillance problem of computer vision research. The task of recognizing activities of human from video sequence exhibits more challenges because of real time processing of data. In this paper, we have proposed a method for recognition of human activities based on Daubechies complex wavelet transform (DCxWT). Better edge representation and approximate shift invariant properties of DCxWT over the other real valued wavelet transform motivates us to utilize properties of DCxWT in recognition of human activities. The multi-class SVM is used for classifying the recognized human activities. The proposed method is compared with other state-of-the-art method, on various standard publicly available dataset, in terms of different quantitative performance measures. We found that the proposed method has better recognition accuracy in comparison to other state-of-the-art methods.
- Research Article
1
- 10.3390/a18040235
- Apr 18, 2025
- Algorithms
Activity recognition and localization in outdoor environments involve identifying and tracking human movements using sensor data, computer vision, or deep learning techniques. This process is crucial for applications such as smart surveillance, autonomous systems, healthcare monitoring, and human–computer interaction. However, several challenges arise in outdoor settings, including varying lighting conditions, occlusions caused by obstacles, environmental noise, and the complexity of differentiating between similar activities. This study presents a hybrid deep learning approach that integrates human activity recognition and localization in outdoor environments using Wi-Fi signal data. The study focuses on applying the hybrid long short-term memory–bi-gated recurrent unit (LSTM-BIGRU) architecture, designed to enhance the accuracy of activity recognition and location estimation. Moreover, experiments were conducted using a real-world dataset collected with the PicoScene Wi-Fi sensing device, which captures both magnitude and phase information. The results demonstrated a significant improvement in accuracy for both activity recognition and localization tasks. To mitigate data scarcity, this study utilized the conditional tabular generative adversarial network (CTGAN) to generate synthetic channel state information (CSI) data. Additionally, carrier frequency offset (CFO) and cyclic shift delay (CSD) preprocessing techniques were implemented to mitigate phase fluctuations. The experiments were conducted in a line-of-sight (LoS) outdoor environment, where CSI data were collected using the PicoScene Wi-Fi sensor platform across four different activities at outdoor locations. Finally, a comparative analysis of the experimental results highlights the superior performance of the proposed hybrid LSTM-BIGRU model, achieving 99.81% and 98.93% accuracy for activity recognition and location prediction, respectively.