Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

A systematic review of deep learning-based models for elderly and human activity recognition

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

A systematic review of deep learning-based models for elderly and human activity recognition

Similar Papers
  • PDF Download Icon
  • Research Article
  • Cite Count Icon 27
  • 10.1155/2016/1784101
An Analysis of Audio Features to Develop a Human Activity Recognition Model Using Genetic Algorithms, Random Forests, and Neural Networks
  • Jan 1, 2016
  • Mobile Information Systems
  • Carlos E Galván-Tejada + 8 more

This work presents a human activity recognition (HAR) model based on audio features. The use of sound as an information source for HAR models represents a challenge because sound wave analyses generate very large amounts of data. However, feature selection techniques may reduce the amount of data required to represent an audio signal sample. Some of the audio features that were analyzed include Mel-frequency cepstral coefficients (MFCC). Although MFCC are commonly used in voice and instrument recognition, their utility within HAR models is yet to be confirmed, and this work validates their usefulness. Additionally, statistical features were extracted from the audio samples to generate the proposed HAR model. The size of the information is necessary to conform a HAR model impact directly on the accuracy of the model. This problem also was tackled in the present work; our results indicate that we are capable of recognizing a human activity with an accuracy of 85% using the HAR model proposed. This means that minimum computational costs are needed, thus allowing portable devices to identify human activities using audio as an information source.

  • Research Article
  • Cite Count Icon 34
  • 10.1007/s41870-021-00719-6
Feature fusion using deep learning for smartphone based human activity recognition
  • Jan 1, 2021
  • International Journal of Information Technology
  • Dipanwita Thakur + 1 more

Identification of human physical activities is an active research area since long due to its application in personalized health and fitness monitoring. The performance accuracy of human activity recognition (HAR) models mainly depend on the features which are extracted from domain knowledge. The features are the input of the classification algorithm to efficiently identify human physical activities. Manually extracted features (handcrafted) need expert domain knowledge. Thus these features have significant importance to identify different human activities. Recently deep learning methods are utilized to extract the features automatically from raw sensory data for HAR models. However, state-of-the-art HAR literature established that the importance of handcrafted features can’t be ignored as it is extracted from expert domain knowledge. Thus, in this paper we use the fusion of both the handcrafted features and automatically extracted features using deep learning (DL) for HAR model to enhance the performance of HAR. Extensive experimental results demonstrate that our proposed feature fusion based HAR model gives higher accuracy compared with state-of-the-art HAR literature for both the self collected and public dataset.

  • Book Chapter
  • Cite Count Icon 1
  • 10.4018/979-8-3693-1738-9.ch006
Human Action Recognition Based on YOLOv7
  • Feb 23, 2024
  • Chenwei Liang + 1 more

Human action recognition is a fundamental research problem in computer vision. The accuracy of human action recognition has important applications. In this book chapter, the authors use a YOLOv7-based model for human action recognition. To evaluate the performance of the model, the action recognition results of YOLOv7 were compared with those using CNN+LSTM, YOLOv5, and YOLOv4. Furthermore, a small human action dataset suitable for YOLO model training is designed. This data set is composed of images extracted from KTH, Weizmann, MSR data sets. In this book chapter, the authors make use of this data set to verify the experimental results. The final experimental results show that using the YOLOv7 model for human action recognition is very convenient and effective, compared with the previous YOLO model.

  • Conference Article
  • Cite Count Icon 17
  • 10.23919/chicc.2019.8865142
Activity Recognition from Mobile Phone using Deep CNN
  • Jul 1, 2019
  • Wei Wu + 1 more

Achieving better performance has always been an important research target in the field of human activity recognition (HAR) based on mobile phone. The traditional activity recognition method mainly relies on artificial feature extraction, but the artificially selected features are not always effective, which affects the improvement of recognition accuracy. This paper mainly introduces a deep convolutional neural networks (CNN) model for human activity recognition, which can effectively improve the accuracy of human activity recognition. First of all, we manually collected 128-dimensional time domain sequence features from the accelerometer and gyroscope sensor data of the smartphone, and then we use a time domain to space domain transformation algorithm, namely Gramian Angular Fields transform algorithm, to convert these time domain signals into a 128×128 spatial signal of the image, which can take full advantage of the very effective deep learning model in the field of computer vision. Thus we can utilize the powerful feature representation capabilities of deep CNN, and then we construct an 8-layer convolutional neural network model for human activity recognition. Experimental results on UCI HAR dataset confirm the effectiveness of our method, the recognition accuracy are satisfactory and competitive compared with traditional and state of the art methods.

  • Research Article
  • Cite Count Icon 36
  • 10.1016/j.ins.2023.119394
HAR-DeepConvLG: Hybrid deep learning-based model for human activity recognition in IoT applications
  • Jul 17, 2023
  • Information Sciences
  • Weiping Ding + 2 more

HAR-DeepConvLG: Hybrid deep learning-based model for human activity recognition in IoT applications

  • Conference Article
  • Cite Count Icon 5
  • 10.1109/sbesc56799.2022.9964520
Assessment and Optimization of 1D CNN Model for Human Activity Recognition
  • Nov 21, 2022
  • Rafael Schild Reusch + 2 more

Artificial Intelligence (AI) solves complex tasks like human activity and speech recognition. Accuracy-driven AI models introduced new challenges related to their applicability in resource-scarce systems. In Human Activity Recognition (HAR), state-of-the-art presents proposals using complex multi-layer LSTM networks. The literature states that LSTM networks are suitable for treating temporal-series data, a key feature for HAR. Most works in the literature seek the best possible accuracy, with few evaluating the overall computational cost to run the inference phase. In HAR, low-power IoT devices such as wearable sensors are widely used as data-gathering devices, but little effort is made to deploy AI technology in these devices. Most studies suggest an approach using edge devices or cloud computing architectures, where the end-device task is to gather and send data to the edge/cloud device. Most voice assistants, such as Amazon's Alexa and Google, use this architecture. In real-life applications, mainly in the healthcare industry, relying only on edge/cloud devices is not acceptable since these devices are not always available or reachable. The objective of this work is to evaluate the accuracy of convolutional networks with a simpler architecture, using 1D convolution, for HAR. The motivation for using networks with simpler network architectures is the possibility of embedding them in power- and memory-constrained devices.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 16
  • 10.3390/s19122790
Dynamic Spatio-Temporal Bag of Expressions (D-STBoE) Model for Human Action Recognition
  • Jun 21, 2019
  • Sensors (Basel, Switzerland)
  • Saima Nazir + 3 more

Human action recognition (HAR) has emerged as a core research domain for video understanding and analysis, thus attracting many researchers. Although significant results have been achieved in simple scenarios, HAR is still a challenging task due to issues associated with view independence, occlusion and inter-class variation observed in realistic scenarios. In previous research efforts, the classical bag of visual words approach along with its variations has been widely used. In this paper, we propose a Dynamic Spatio-Temporal Bag of Expressions (D-STBoE) model for human action recognition without compromising the strengths of the classical bag of visual words approach. Expressions are formed based on the density of a spatio-temporal cube of a visual word. To handle inter-class variation, we use class-specific visual word representation for visual expression generation. In contrast to the Bag of Expressions (BoE) model, the formation of visual expressions is based on the density of spatio-temporal cubes built around each visual word, as constructing neighborhoods with a fixed number of neighbors could include non-relevant information making a visual expression less discriminative in scenarios with occlusion and changing viewpoints. Thus, the proposed approach makes the model more robust to occlusion and changing viewpoint challenges present in realistic scenarios. Furthermore, we train a multi-class Support Vector Machine (SVM) for classifying bag of expressions into action classes. Comprehensive experiments on four publicly available datasets: KTH, UCF Sports, UCF11 and UCF50 show that the proposed model outperforms existing state-of-the-art human action recognition methods in term of accuracy to 99.21%, 98.60%, 96.94 and 94.10%, respectively.

  • Research Article
  • Cite Count Icon 853
  • 10.1609/aaai.v31i1.11212
An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data
  • Feb 12, 2017
  • Proceedings of the AAAI Conference on Artificial Intelligence
  • Sijie Song + 4 more

Human action recognition is an important task in computer vision. Extracting discriminative spatial and temporal features to model the spatial and temporal evolutions of different actions plays a key role in accomplishing this task. In this work, we propose an end-to-end spatial and temporal attention model for human action recognition from skeleton data. We build our model on top of the Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM), which learns to selectively focus on discriminative joints of skeleton within each frame of the inputs and pays different levels of attention to the outputs of different frames. Furthermore, to ensure effective training of the network, we propose a regularized cross-entropy loss to drive the model learning process and develop a joint training strategy accordingly. Experimental results demonstrate the effectiveness of the proposed model, both on the small human action recognition dataset of SBU and the currently largest NTU dataset.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 31
  • 10.1007/s12652-022-03768-2
An active semi-supervised deep learning model for human activity recognition
  • Mar 20, 2022
  • Journal of Ambient Intelligence and Humanized Computing
  • Haixia Bi + 4 more

Human activity recognition (HAR), which aims at inferring the behavioral patterns of people, is a fundamental research problem in digital health and ambient intelligence. The application of machine learning methods in HAR has been investigated vigorously in recent years. However, there are still a number of challenges confronting the task, where one significant barrier lies in the longstanding shortage of annotations. To address this issue, we establish a new paradigm for HAR, which integrates active learning and semi-supervised learning into one framework. The main idea is to reduce the annotation cost by actively selecting the most informative samples for annotation, as well as leveraging the unlabelled instances in a semi-supervised way. In particular, we propose to utilize the massive unlabelled data via temporal ensembling of convolutional neural networks (CNN), which yields robust consensus predictions by aggregating the outputs of the training networks on different epochs. We conducted extensive experiments on three public benchmark datasets. The proposed method achieves Macro F1 values of 0.76, 0.45 and 0.91 in a low annotation scenario on PAMAP2, USCHAD and UCIHAR datasets respectively, outperforming a multitude of state-of-the-art deep models. The ablation study proves the effectiveness of the two components of the framework, i.e., active learning-based sample selection and semi-supervised model training with temporal ensembling, in alleviating the issue of insufficient labels. Cross-validation and statistical significance experiments further demonstrate the robustness and generalization ability of the proposed method. The source codes are available at https://github.com/HaixiaBi1982/ActSemiCNNAct.

  • Book Chapter
  • 10.1007/978-3-030-78124-8_6
Assessment of Deep Learning Models for Human Activity Recognition on Multi-variate Time Series Data and Non-targeted Adversarial Attack
  • Nov 3, 2021
  • Mahbuba Tasmin + 6 more

Human activity recognition (HAR) is one of the leading research fields in ubiquitous computing working to integrate seamless technologies in our daily lives. The researches in this field focus on the technological advancement of fine activity recognition through using minimal technological deployments along with consideration of human factors. The domain specific knowledge of human activity recognition has shed significant importance on the activity data processing, due to the innate random nature of the activity data. The necessity for improved prediction accuracy of activities within limited infrastructural support and cost is necessary for the research works to be conducted at large. The data set utilized in this study is modified through tree-based feature-engineering method, which has significantly impacted the intrinsic time-series pattern associated with the original data set acquired from UCI machine learning repository. In this paper, the primary focus lies on improving classification accuracy of the human activities included in the data set through utilizing deep learning classifier models. The particular modified pattern of the data devoid of time-series pattern to major extent has resulted in lower classification accuracy by benchmark time-series classification models namely Keras-LSTM and RNN-LSTM. CNN based networks have demonstrated consisted improved training performance on the data set. The state-of-the-art classifier ResNet introduced by Microsoft Research in 2015 has achieved 99.9% classification accuracy on the data set of this study. Another important aim of this study is to infer the risk attached with adversarial attack on machine learning models. The high vulnerability of machine learning models in face of adversarial attacks has emerged as security concern of the deployment pipeline of machine learning models. To the author’s best knowledge, there has been no prior study to find out the high insecurity associated with the human activity recognition data set and concerned benchmark models. In this study, two prevalent adversarial approaches namely Fast Gradient Sign Method (FGSM) and Basic Iterative Method (BIM) have been employed to carry out non-targeted attack on the pre-trained ResNet model. The accuracy loss by the adversarial attack over a small range of added perturbation is tremendous, nearly 96% accuracy is diminished in the process. Finally, we have demonstrated the comparative performance by FGSM and BIM on reducing the classification accuracy of pre-trained model. The findings shed light on the nature of the data influencing the classification accuracy and the resiliency of state-of-the-art classifier model on facing adversarial attacks.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 15
  • 10.3390/app12115408
Marfusion: An Attention-Based Multimodal Fusion Model for Human Activity Recognition in Real-World Scenarios
  • May 26, 2022
  • Applied Sciences
  • Yunhan Zhao + 5 more

Human Activity Recognition(HAR) plays an important role in the field of ubiquitous computing, which can benefit various human-centric applications such as smart homes, health monitoring, and aging systems. Human Activity Recognition mainly leverages smartphones and wearable devices to collect sensory signals labeled with activity annotations and train machine learning models to recognize individuals’ activity automatically. In order to deploy the Human Activity Recognition model in real-world scenarios, however, there are two major barriers. Firstly, sensor data and activity labels are traditionally collected using special experimental equipment in a controlled environment, which means fitting models trained with these datasets may result in poor generalization to real-life scenarios. Secondly, existing studies focus on single or a few modalities of sensor readings, which neglect useful information and its relations existing in multimodal sensor data. To tackle these issues, we propose a novel activity recognition model for multimodal sensory data fusion: Marfusion, and an experimental data collection platform for HAR tasks in real-world scenarios: MarSense. Specifically, Marfusion extensively uses a convolution structure to extract sensory features for each modality of the smartphone sensor and then fuse the multimodal features using the attention mechanism. MarSense can automatically collect a large amount of smartphone sensor data via smartphones among multiple users in their natural-used conditions and environment. To evaluate our proposed platform and model, we conduct a data collection experiment in real-life among university students and then compare our Marfusion model with several other state-of-the-art models on the collected datasets. Experimental Results do not only indicate that the proposed platform collected Human Activity Recognition data in the real-world scenario successfully, but also verify the advantages of the Marfusion model compared to existing models in Human Activity Recognition.

  • Research Article
  • Cite Count Icon 83
  • 10.1016/j.asoc.2022.109363
A survey on unsupervised learning for wearable sensor-based activity recognition
  • Jul 25, 2022
  • Applied Soft Computing
  • Ayokunle Olalekan Ige + 1 more

A survey on unsupervised learning for wearable sensor-based activity recognition

  • Research Article
  • Cite Count Icon 5
  • 10.3233/ais-220125
Seq2seq model for human action recognition based on skeleton and two-layer bidirectional LSTM
  • Dec 5, 2023
  • Journal of Ambient Intelligence and Smart Environments
  • Shouke Wei + 3 more

Human action recognition (HAR) plays an important role in social interaction in various fields. This study proposes a light-weight skeleton and two-layer bidirectional LSTM-based Seq2Seq model (SB2_Seq2Seq) for HAR to trade off recognition accuracy, users’ privacy and computer resource usage. An experiment was conducted to compare the proposed SB2_Seq2Seq with other skeleton-based Seq2Seq models and non-skeleton RGB video frame-based LSTM, CNN and seq2seq models. The UCF50 dataset was used for model evaluation, where 60%, 20% and 20% for model training, validation and testing, respectively. The experimental results show that the proposed model achieves 93.54% accuracy with 0.0214 Mean Square Error (MSE), suggesting that the proposed model outperforms all the other models. Besides, it also shows that the proposed model achieves state-of-the-art accuracy compared with state-of-the-arts methods in literature.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 20
  • 10.3390/s21103381
Human Activity Recognition for People with Knee Osteoarthritis-A Proof-of-Concept.
  • May 12, 2021
  • Sensors
  • Jay-Shian Tan + 8 more

Clinicians lack objective means for monitoring if their knee osteoarthritis patients are improving outside of the clinic (e.g., at home). Previous human activity recognition (HAR) models using wearable sensor data have only used data from healthy people and such models are typically imprecise for people who have medical conditions affecting movement. HAR models designed for people with knee osteoarthritis have classified rehabilitation exercises but not the clinically relevant activities of transitioning from a chair, negotiating stairs and walking, which are commonly monitored for improvement during therapy for this condition. Therefore, it is unknown if a HAR model trained on data from people who have knee osteoarthritis can be accurate in classifying these three clinically relevant activities. Therefore, we collected inertial measurement unit (IMU) data from 18 participants with knee osteoarthritis and trained convolutional neural network models to identify chair, stairs and walking activities, and phases. The model accuracy was 85% at the first level of classification (activity), 89–97% at the second (direction of movement) and 60–67% at the third level (phase). This study is the first proof-of-concept that an accurate HAR system can be developed using IMU data from people with knee osteoarthritis to classify activities and phases of activities.

  • Research Article
  • Cite Count Icon 218
  • 10.1016/j.jksuci.2019.09.004
A new hybrid deep learning model for human action recognition
  • Sep 9, 2019
  • Journal of King Saud University - Computer and Information Sciences
  • Neziha Jaouedi + 2 more

A new hybrid deep learning model for human action recognition

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant