Attention-guided residual shrinkage with gated recurrent unit for human activity recognition
Attention-guided residual shrinkage with gated recurrent unit for human activity recognition
- Research Article
3
- 10.7717/peerj-cs.1804
- Jan 11, 2024
- PeerJ. Computer science
Human Action Recognition (HAR) is an essential topic in computer vision and artificial intelligence, focused on the automatic identification and categorization of human actions or activities from video sequences or sensor data. The goal of HAR is to teach machines to comprehend and interpret human movements, gestures, and behaviors, allowing for a wide range of applications in areas such as surveillance, healthcare, sports analysis, and human-computer interaction. HAR systems utilize a variety of techniques, including deep learning, motion analysis, and feature extraction, to capture and analyze the spatiotemporal characteristics of human actions. These systems have the capacity to distinguish between various actions, whether they are simple actions like walking and waving or more complex activities such as playing a musical instrument or performing sports maneuvers. HAR continues to be an active area of research and development, with the potential to enhance numerous real-world applications by providing machines with the ability to understand and respond to human actions effectively. In our study, we developed a HAR system to recognize actions in tennis using an attention-based gated recurrent unit (GRU), a prevalent recurrent neural network. The combination of GRU architecture and attention mechanism showed a significant improvement in prediction power compared to two other deep learning models. Our models were trained on the THETIS dataset, one of the standard medium-sized datasets for fine-grained tennis actions. The effectiveness of the proposed model was confirmed by three different types of image encoders: InceptionV3, DenseNet, and EfficientNetB5. The models developed with InceptionV3, DenseNet, and EfficientNetB5 achieved average ROC-AUC values of 0.97, 0.98, and 0.81, respectively. While, the models obtained average PR-AUC values of 0.84, 0.87, and 0.49 for InceptionV3, DenseNet, and EfficientNetB5 features, respectively. The experimental results confirmed the applicability of our proposed method in recognizing action in tennis and may be applied to other HAR problems.
- Research Article
27
- 10.1109/tcss.2023.3249152
- Feb 1, 2024
- IEEE Transactions on Computational Social Systems
Smart video surveillance plays a significant role in public security via storing a huge amount of continual stream data, evaluates them, and generates warns where undesirable human activities are performed. Recognition of human activities in video surveillance has faced many challenges such as optimal evaluation of human activities under growing volumes of streaming data with complex computation and huge time processing complexity. To tackle these challenges we introduce a lightweighted spatial-deep features integration using multilayer GRU (SDIGRU). First, we extract spatial and deep features from frames sequence of realistic human activity videos via utilizing a lightweight MobileNetV2 model and then integrate those spatial-deep features. Although deep features can be used for human activity recognition, they contain only the high-level appearance, which is insufficient to correctly identify the particular activity of human. Thus, we jointly apply deep information with spatial appearance to produce detailed level information. Furthermore, we select rich informative features from spatial-deep appearances. Then, we train multilayer gated recurrent unit (GRU) and feed informative features to learn the temporal dynamics of human activity frames sequence at each time step of GRU. We conduct our experiments on benchmark YouTube11, HMDB51, and UCF101 datasets of human activity recognition. The empirical results show that our method achieved significant recognition performance with low computational complexity and quick response. Finally, we compare the results with existing state-of-the-art techniques, which show the effectiveness of our method.
- Research Article
1
- 10.47992/ijmts.2581.6012.0318
- Nov 23, 2023
- International Journal of Management, Technology, and Social Sciences
Purpose: The objective of this research article is to methodically combine the existing literature on Human Activity Recognition (HAR) and provide an understanding of the present state of the HAR literature. Additionally, the article aims to suggest an appropriate HAR system that can be used for detecting real-time activities such as suspicious behavior, surveillance, and healthcare. Objective: This review study intends to delve into the current state of human activity detection and recognition methods, while also pointing towards promising avenues for further research and development in the field, particularly with regards to complex and multi-task human activity recognition across different domains. Design/Methodology/Approach: A systematic literature review methodology was adopted by collecting and analyzing the required literature available from international and national journals, conferences, databases and other resources searched through the Google Scholar and other search engines. Findings/Result: The systematic review of literature uncovered the various approaches of Human activity detection and recognition. Even though the prevailing literature reports the investigations of several aspects of Human activity detection and recognition, there is still room for exploring the role of this technology in various domains to enhance its robustness in detecting and recognizing of multiple human actions from preloaded CCTV cameras, which can aid in detecting abnormal and suspicious activities and ultimately reduce aberrant human actions in society. Originality/Value: This paper follows a systematic approach to examine the factors that impact the detection and recognition of Human activity and suggests a concept map. The study undertaken supplements the expanding literature on knowledge sharing highlighting its significance. Paper Type: Review Paper.
- Research Article
19
- 10.1155/2022/8383461
- Oct 5, 2022
- Security and Communication Networks
Automatic human activity recognition is one of the milestones of smart city surveillance projects. Human activity detection and recognition aim to identify the activities based on the observations that are being performed by the subject. Hence, vision-based human activity recognition systems have a wide scope in video surveillance, health care systems, and human-computer interaction. Currently, the world is moving towards a smart and safe city concept. Automatic human activity recognition is the major challenge of smart city surveillance. The proposed research work employed fine-tuned YOLO-v4 for activity detection, whereas for classification purposes, 3D-CNN has been implemented. Besides the classification, the presented research model also leverages human-object interaction with the help of intersection over union (IOU). An Internet of Things (IoT) based architecture is implemented to take efficient and real-time decisions. The dataset of exploit classes has been taken from the UCF-Crime dataset for activity recognition. At the same time, the dataset extracted from MS-COCO for suspicious object detection is involved in human-object interaction. This research is also applied to human activity detection and recognition in the university premises for real-time suspicious activity detection and automatic alerts. The experiments have exhibited that the proposed multimodal approach achieves remarkable activity detection and recognition accuracy.
- Research Article
8
- 10.1142/s0218001421520066
- Dec 29, 2020
- International Journal of Pattern Recognition and Artificial Intelligence
Recognition of hand activities of daily living (hand-ADL) is useful in the areas of human–computer interactions, lifelogging, and healthcare applications. However, developing a reliable human activity recognition (HAR) system for hand-ADL with only a single wearable sensor is still a challenge due to hand movements that are typically transient and sporadic. Approaches based on deep learning methodologies to reduce noise and extract relevant features directly from raw data are becoming more promising for implementing such HAR systems. In this work, we present an ARMA-based deep autoencoder and a deep recurrent network (RNN) using Gated Recurrent Unit (GRU) for recognition of hand-ADL using signals from a single IMU wearable sensor. The integrated ARMA-based autoencoder denoises raw time-series signals of hand activities, such that better representation of human hand activities can be made. Then, our deep RNN-GRU recognizes seven hand-ADL based upon the output of the autoencoder: namely, Open Door, Close Door, Open Refrigerator, Close Refrigerator, Open Drawer, Close Drawer, and Drink from Cup. The proposed methodology using RNN-GRU with autoencoder achieves a mean accuracy of 84.94% and F1-score of 83.05% outperforming conventional classifiers such as RNN-LSTM, BRNN-LSTM, CNN, and Hybrid-RNNs by 4–10% higher in both accuracy and F1-score. The experimental results also showed the use of the autoencoder improves both the accuracy and F1-score of each conventional classifier by 12.8% in RNN-LSTM, 4.37% in BRNN-LSTM, 15.45% CNN, 14.6% Hybrid RNN, and 12.4% for the proposed RNN-GRU.
- Research Article
36
- 10.1155/2022/1808990
- Oct 6, 2022
- Computational intelligence and neuroscience
In recent days, research in human activity recognition (HAR) has played a significant role in healthcare systems. The accurate activity classification results from the HAR enhance the performance of the healthcare system with broad applications. HAR results are useful in monitoring a person's health, and the system predicts abnormal activities based on user movements. The HAR system's abnormal activity predictions provide better healthcare monitoring and reduce users' health issues. The conventional HAR systems use wearable sensors, such as inertial measurement unit (IMU) and stretch sensors for activity recognition. These approaches show remarkable performances to the user's basic activities such as sitting, standing, and walking. However, when the user performs complex activities, such as running, jumping, and lying, the sensor-based HAR systems have a higher degree of misclassification results due to the reading errors from sensors. These sensor errors reduce the overall performance of the HAR system with the worst classification results. Similarly, radiofrequency or vision-based HAR systems are not free from classification errors when used in real time. In this paper, we address some of the existing challenges of HAR systems by proposing a human image threshing (HIT) machine-based HAR system that uses an image dataset from a smartphone camera for activity recognition. The HIT machine effectively uses a mask region-based convolutional neural network (R-CNN) for human body detection, a facial image threshing machine (FIT) for image cropping and resizing, and a deep learning model for activity classification. We demonstrated the effectiveness of our proposed HIT machine-based HAR system through extensive experiments and results. The proposed HIT machine achieved 98.53% accuracy when the ResNet architecture was used as its deep learning model.
- Research Article
4
- 10.1117/1.jei.31.5.051409
- Apr 15, 2022
- Journal of Electronic Imaging
Human activity recognition is a field of video processing that requires restricted temporal analysis of video sequences for estimating the existence of different human actions. Designing an efficient human activity model requires credible implementations of keyframe extraction, preprocessing, feature extraction and selection, classification, and pattern recognition methods. In the real-time video, sequences are untrimmed and do not have any activity endpoints for effective recognition. Thus, we propose a hybrid gated recurrent unit and long short-term memory-based recurrent neural network model for high-efficiency human action recognition in untrimmed video datasets. The proposed model is tested on the TRECVID dataset, along with other online datasets, and is observed to have an accuracy of over 91% for untrimmed video-based activity recognition. This accuracy is compared with various state-of-the-art models and is found to be higher when evaluated on multiple datasets.
- Research Article
96
- 10.9781/ijimai.2017.447
- Jan 1, 2017
- International Journal of Interactive Multimedia and Artificial Intelligence
Increase in number of elderly people who are living independently needs especial care in the form of healthcare monitoring systems. Recent advancements in depth video technologies have made human activity recognition (HAR) realizable for elderly healthcare applications. In this paper, a depth video-based novel method for HAR is presented using robust multi-features and embedded Hidden Markov Models (HMMs) to recognize daily life activities of elderly people living alone in indoor environment such as smart homes. In the proposed HAR framework, initially, depth maps are analyzed by temporal motion identification method to segment human silhouettes from noisy background and compute depth silhouette area for each activity to track human movements in a scene. Several representative features, including invariant, multi-view differentiation and spatiotemporal body joints features were fused together to explore gradient orientation change, intensity differentiation, temporal variation and local motion of specific body parts. Then, these features are processed by the dynamics of their respective class and learned, modeled, trained and recognized with specific embedded HMM having active feature values. Furthermore, we construct a new online human activity dataset by a depth sensor to evaluate the proposed features. Our experiments on three depth datasets demonstrated that the proposed multi-features are efficient and robust over the state of the art features for human action and activity recognition.
- Research Article
- 10.1177/02783649251383947
- Oct 1, 2025
- The International Journal of Robotics Research
In this paper, we evaluate the performance of a self-powered sensing unit for human activity recognition (HAR). The system consists of two triboelectric nanogenerators (TENGs) that are embedded inside the insole of the left shoe. An Inertial Measurement Unit (IMU) is also attached to the ankle of the left foot to provide a reference point. The IMU will serve to compare the performance of the TENG-based HAR to that of the IMU-based HAR. Five physical activities were monitored in this study: walking on a flat surface, walking upstairs, walking downstairs, running, and jumping. Each of these segments of activities was designed with a few seconds of idle time before and after for better annotation and segmentation. The idle periods enhanced data separation, reducing overlap between activities and ensuring clearer, more accurate analysis for each movement type. We observed that TENG data clearly identifies all five distinct activities based on specific gait pattern recognition. This capability illustrates the effectiveness of TENGs in capturing unique activity signatures with minimal interference. Further on, these activities are classified by different machine learning algorithms with sufficient accuracy and minimal data preprocessing. Among the various tested algorithms, the highest performance was obtained within the Random Forest classifier, reaching an accuracy of 93%. This work proves that TENG-based motion sensing is suitable for activity recognition for portable Internet of Things (IoT) devices with lower energy expenditure. Additionally, the findings highlight the potential of TENGs in promoting sustainable, energy-efficient wearable technology.
- Research Article
- 10.54060/jmss.2023.44
- Jan 1, 2023
- Journal of Management and Service Science (JMSS)
Human Activity Recognition (or, HAR) is a piece of software that uses AI algorithms to recognize and categories human physical activity. By analyzing signal data from multiple sensors such as accelerometers, gyroscopes, and magnetometers, the system is meant to recognize and categorize physical activities such as walking, running, leaping, ascending stairs, and others. To recognize human activity patterns, the HAR system employs signal preprocessing, feature extraction, and classification algorithms. The use of simulated intelligence techniques such as deep learning computations, convolutional brain organizations, and supporting vector machines has improved the display of HAR frameworks. The system may be utilized for a variety of purposes, including security, sports, fitness, and healthcare. In general, the HAR framework provides a beneficial value to robotized human activities. Man-made reasoning (Artificial Intelligence) plays an important role in Human Activity Recognition by allowing frameworks to learn and adapt to new conditions. In general, HAR framework is beneficial asset to robotized human movement recognition, working with the advancement of clever frameworks that can research human behaviour and work on personal fulfilment. Overall, Human Activity Recognition Using Computerized Reasoning is promising innovation that enables intelligent frameworks to perceive and group human activities gradually. This breakthrough has the potential to disrupt several businesses and improve people's personal pleasure by enabling personalized medical treatment, improving game execution, and improving street safety. The creation of this software sets the path for more study into themes such as the relationship between individual health status and physical activity. Overall, creating a fruitful Human Action Acknowledgement project utilizing recordings necessitates a broad understanding of AI and Profound Learning methods. As a result, success of this project highlights the value of creativity and perseverance in learning. Finally, it is the initial step towards developing more advanced systems that will improve people's lives in the future.
- Research Article
80
- 10.3389/fphys.2024.1344887
- Feb 21, 2024
- Frontiers in Physiology
Human activity recognition (HAR) plays a pivotal role in various domains, including healthcare, sports, robotics, and security. With the growing popularity of wearable devices, particularly Inertial Measurement Units (IMUs) and Ambient sensors, researchers and engineers have sought to take advantage of these advances to accurately and efficiently detect and classify human activities. This research paper presents an advanced methodology for human activity and localization recognition, utilizing smartphone IMU, Ambient, GPS, and Audio sensor data from two public benchmark datasets: the Opportunity dataset and the Extrasensory dataset. The Opportunity dataset was collected from 12 subjects participating in a range of daily activities, and it captures data from various body-worn and object-associated sensors. The Extrasensory dataset features data from 60 participants, including thousands of data samples from smartphone and smartwatch sensors, labeled with a wide array of human activities. Our study incorporates novel feature extraction techniques for signal, GPS, and audio sensor data. Specifically, for localization, GPS, audio, and IMU sensors are utilized, while IMU and Ambient sensors are employed for locomotion activity recognition. To achieve accurate activity classification, state-of-the-art deep learning techniques, such as convolutional neural networks (CNN) and long short-term memory (LSTM), have been explored. For indoor/outdoor activities, CNNs are applied, while LSTMs are utilized for locomotion activity recognition. The proposed system has been evaluated using the k-fold cross-validation method, achieving accuracy rates of 97% and 89% for locomotion activity over the Opportunity and Extrasensory datasets, respectively, and 96% for indoor/outdoor activity over the Extrasensory dataset. These results highlight the efficiency of our methodology in accurately detecting various human activities, showing its potential for real-world applications. Moreover, the research paper introduces a hybrid system that combines machine learning and deep learning features, enhancing activity recognition performance by leveraging the strengths of both approaches.
- Book Chapter
66
- 10.1007/978-3-319-62704-5_7
- Jan 1, 2017
As one of the fastest spreading technologies and due to their rich sensing features, smartphones have become popular elements of modern human activity recognition systems. Besides activity recognition, smartphones have also been employed with success in fall detection/recognition systems, although a combined approach has not been evaluated yet. This article presents the results of a comprehensive evaluation of using a smartphone’s acceleration sensor for human activity and fall recognition, including 12 different types of activities of daily living (ADLs) and 4 different types of falls, recorded from 66 subjects in the context of creating “MobiAct”, a publicly available dataset for benchmarking and developing human activity and fall recognition systems. An optimized feature selection and classification scheme is proposed for each, a basic, i.e. recognition of 6 common ADLs only (99.9% accuracy), and a more complex human activity recognition task that includes all 12 ADLs and 4 falls (96.8% accuracy).
- Research Article
27
- 10.3390/jimaging9070130
- Jun 26, 2023
- Journal of Imaging
Vision-based human activity recognition (HAR) has emerged as one of the essential research areas in video analytics. Over the last decade, numerous advanced deep learning algorithms have been introduced to recognize complex human actions from video streams. These deep learning algorithms have shown impressive performance for the video analytics task. However, these newly introduced methods either exclusively focus on model performance or the effectiveness of these models in terms of computational efficiency, resulting in a biased trade-off between robustness and computational efficiency in their proposed methods to deal with challenging HAR problem. To enhance both the accuracy and computational efficiency, this paper presents a computationally efficient yet generic spatial-temporal cascaded framework that exploits the deep discriminative spatial and temporal features for HAR. For efficient representation of human actions, we propose an efficient dual attentional convolutional neural network (DA-CNN) architecture that leverages a unified channel-spatial attention mechanism to extract human-centric salient features in video frames. The dual channel-spatial attention layers together with the convolutional layers learn to be more selective in the spatial receptive fields having objects within the feature maps. The extracted discriminative salient features are then forwarded to a stacked bi-directional gated recurrent unit (Bi-GRU) for long-term temporal modeling and recognition of human actions using both forward and backward pass gradient learning. Extensive experiments are conducted on three publicly available human action datasets, where the obtained results verify the effectiveness of our proposed framework (DA-CNN+Bi-GRU) over the state-of-the-art methods in terms of model accuracy and inference runtime across each dataset. Experimental results show that the DA-CNN+Bi-GRU framework attains an improvement in execution time up to 167× in terms of frames per second as compared to most of the contemporary action-recognition methods.
- Research Article
23
- 10.1016/j.ins.2023.01.121
- Feb 1, 2023
- Information Sciences
Bimodal HAR-An efficient approach to human activity analysis and recognition using bimodal hybrid classifiers
- Conference Article
263
- 10.1109/iccvw.2011.6130379
- Nov 1, 2011
In this paper, we present a home-monitoring oriented human activity recognition benchmark database, based on the combination of a color video camera and a depth sensor. Our contributions are two-fold: 1) We have created a publicly releasable human activity video database (i.e., named as RGBD-HuDaAct), which contains synchronized color-depth video streams, for the task of human daily activity recognition. This database aims at encouraging more research efforts on human activity recognition based on multi-modality sensor combination (e.g., color plus depth). 2) Two multi-modality fusion schemes, which naturally combine color and depth information, have been developed from two state-of-the-art feature representation methods for action recognition, i.e., spatio-temporal interest points (STIPs) and motion history images (MHIs). These depth-extended feature representation methods are evaluated comprehensively and superior recognition performances over their uni-modality (e.g., color only) counterparts are demonstrated.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.