Benchmarking Radar Preprocessing Techniques and Transfer Learning Models for FMCW-based Human Activity Recognition
Benchmarking Radar Preprocessing Techniques and Transfer Learning Models for FMCW-based Human Activity Recognition
- Research Article
27
- 10.1155/2016/1784101
- Jan 1, 2016
- Mobile Information Systems
This work presents a human activity recognition (HAR) model based on audio features. The use of sound as an information source for HAR models represents a challenge because sound wave analyses generate very large amounts of data. However, feature selection techniques may reduce the amount of data required to represent an audio signal sample. Some of the audio features that were analyzed include Mel-frequency cepstral coefficients (MFCC). Although MFCC are commonly used in voice and instrument recognition, their utility within HAR models is yet to be confirmed, and this work validates their usefulness. Additionally, statistical features were extracted from the audio samples to generate the proposed HAR model. The size of the information is necessary to conform a HAR model impact directly on the accuracy of the model. This problem also was tackled in the present work; our results indicate that we are capable of recognizing a human activity with an accuracy of 85% using the HAR model proposed. This means that minimum computational costs are needed, thus allowing portable devices to identify human activities using audio as an information source.
- Research Article
228
- 10.1109/tpami.2017.2691768
- Apr 6, 2017
- IEEE Transactions on Pattern Analysis and Machine Intelligence
Recognizing human actions from unknown and unseen (novel) views is a challenging problem. We propose a Robust Non-Linear Knowledge Transfer Model (R-NKTM) for human action recognition from novel views. The proposed R-NKTM is a deep fully-connected neural network that transfers knowledge of human actions from any unknown view to a shared high-level virtual view by finding a set of non-linear transformations that connects the views. The R-NKTM is learned from 2D projections of dense trajectories of synthetic 3D human models fitted to real motion capture data and generalizes to real videos of human actions. The strength of our technique is that we learn a single R-NKTM for all actions and all viewpoints for knowledge transfer of any real human action video without the need for re-training or fine-tuning the model. Thus, R-NKTM can efficiently scale to incorporate new action classes. R-NKTM is learned with dummy labels and does not require knowledge of the camera viewpoint at any stage. Experiments on three benchmark cross-view human action datasets show that our method outperforms existing state-of-the-art.
- Conference Article
17
- 10.23919/chicc.2019.8865142
- Jul 1, 2019
Achieving better performance has always been an important research target in the field of human activity recognition (HAR) based on mobile phone. The traditional activity recognition method mainly relies on artificial feature extraction, but the artificially selected features are not always effective, which affects the improvement of recognition accuracy. This paper mainly introduces a deep convolutional neural networks (CNN) model for human activity recognition, which can effectively improve the accuracy of human activity recognition. First of all, we manually collected 128-dimensional time domain sequence features from the accelerometer and gyroscope sensor data of the smartphone, and then we use a time domain to space domain transformation algorithm, namely Gramian Angular Fields transform algorithm, to convert these time domain signals into a 128×128 spatial signal of the image, which can take full advantage of the very effective deep learning model in the field of computer vision. Thus we can utilize the powerful feature representation capabilities of deep CNN, and then we construct an 8-layer convolutional neural network model for human activity recognition. Experimental results on UCI HAR dataset confirm the effectiveness of our method, the recognition accuracy are satisfactory and competitive compared with traditional and state of the art methods.
- Book Chapter
1
- 10.4018/979-8-3693-1738-9.ch006
- Feb 23, 2024
Human action recognition is a fundamental research problem in computer vision. The accuracy of human action recognition has important applications. In this book chapter, the authors use a YOLOv7-based model for human action recognition. To evaluate the performance of the model, the action recognition results of YOLOv7 were compared with those using CNN+LSTM, YOLOv5, and YOLOv4. Furthermore, a small human action dataset suitable for YOLO model training is designed. This data set is composed of images extracted from KTH, Weizmann, MSR data sets. In this book chapter, the authors make use of this data set to verify the experimental results. The final experimental results show that using the YOLOv7 model for human action recognition is very convenient and effective, compared with the previous YOLO model.
- Research Article
34
- 10.1007/s41870-021-00719-6
- Jan 1, 2021
- International Journal of Information Technology
Identification of human physical activities is an active research area since long due to its application in personalized health and fitness monitoring. The performance accuracy of human activity recognition (HAR) models mainly depend on the features which are extracted from domain knowledge. The features are the input of the classification algorithm to efficiently identify human physical activities. Manually extracted features (handcrafted) need expert domain knowledge. Thus these features have significant importance to identify different human activities. Recently deep learning methods are utilized to extract the features automatically from raw sensory data for HAR models. However, state-of-the-art HAR literature established that the importance of handcrafted features can’t be ignored as it is extracted from expert domain knowledge. Thus, in this paper we use the fusion of both the handcrafted features and automatically extracted features using deep learning (DL) for HAR model to enhance the performance of HAR. Extensive experimental results demonstrate that our proposed feature fusion based HAR model gives higher accuracy compared with state-of-the-art HAR literature for both the self collected and public dataset.
- Research Article
- 10.1088/1742-6596/1087/6/062011
- Sep 1, 2018
- Journal of Physics: Conference Series
Human action recognition is to judge human action by analyzing action characteristics in the fields of computer vision and video surveillance. As the development of machine learning technique, the application of Bayesian Learning Model is increasing in related fields. In order to analyze the characteristics of human action and then recognize human action, this paper introduce a survey on Bayesian Learning Model for Human Action Recognition. The paper focuses on Bayesian handcrafted and deep learning models, and evaluate the state-of-the-art benchmark datasets, e.g., Weizmann, KTH, MSR-3D, HOHA, and UCF101. In this paper, all papers are published ranging from 2007 to 2016, which provides an overview of the progress in this area.
- Research Article
5
- 10.3233/ais-220125
- Dec 5, 2023
- Journal of Ambient Intelligence and Smart Environments
Human action recognition (HAR) plays an important role in social interaction in various fields. This study proposes a light-weight skeleton and two-layer bidirectional LSTM-based Seq2Seq model (SB2_Seq2Seq) for HAR to trade off recognition accuracy, users’ privacy and computer resource usage. An experiment was conducted to compare the proposed SB2_Seq2Seq with other skeleton-based Seq2Seq models and non-skeleton RGB video frame-based LSTM, CNN and seq2seq models. The UCF50 dataset was used for model evaluation, where 60%, 20% and 20% for model training, validation and testing, respectively. The experimental results show that the proposed model achieves 93.54% accuracy with 0.0214 Mean Square Error (MSE), suggesting that the proposed model outperforms all the other models. Besides, it also shows that the proposed model achieves state-of-the-art accuracy compared with state-of-the-arts methods in literature.
- Research Article
26
- 10.1016/j.pmcj.2012.08.004
- Aug 10, 2012
- Pervasive and Mobile Computing
ADR-SPLDA: Activity discovery and recognition by combining sequential patterns and latent Dirichlet allocation
- Research Article
16
- 10.3390/s19122790
- Jun 21, 2019
- Sensors (Basel, Switzerland)
Human action recognition (HAR) has emerged as a core research domain for video understanding and analysis, thus attracting many researchers. Although significant results have been achieved in simple scenarios, HAR is still a challenging task due to issues associated with view independence, occlusion and inter-class variation observed in realistic scenarios. In previous research efforts, the classical bag of visual words approach along with its variations has been widely used. In this paper, we propose a Dynamic Spatio-Temporal Bag of Expressions (D-STBoE) model for human action recognition without compromising the strengths of the classical bag of visual words approach. Expressions are formed based on the density of a spatio-temporal cube of a visual word. To handle inter-class variation, we use class-specific visual word representation for visual expression generation. In contrast to the Bag of Expressions (BoE) model, the formation of visual expressions is based on the density of spatio-temporal cubes built around each visual word, as constructing neighborhoods with a fixed number of neighbors could include non-relevant information making a visual expression less discriminative in scenarios with occlusion and changing viewpoints. Thus, the proposed approach makes the model more robust to occlusion and changing viewpoint challenges present in realistic scenarios. Furthermore, we train a multi-class Support Vector Machine (SVM) for classifying bag of expressions into action classes. Comprehensive experiments on four publicly available datasets: KTH, UCF Sports, UCF11 and UCF50 show that the proposed model outperforms existing state-of-the-art human action recognition methods in term of accuracy to 99.21%, 98.60%, 96.94 and 94.10%, respectively.
- Research Article
853
- 10.1609/aaai.v31i1.11212
- Feb 12, 2017
- Proceedings of the AAAI Conference on Artificial Intelligence
Human action recognition is an important task in computer vision. Extracting discriminative spatial and temporal features to model the spatial and temporal evolutions of different actions plays a key role in accomplishing this task. In this work, we propose an end-to-end spatial and temporal attention model for human action recognition from skeleton data. We build our model on top of the Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM), which learns to selectively focus on discriminative joints of skeleton within each frame of the inputs and pays different levels of attention to the outputs of different frames. Furthermore, to ensure effective training of the network, we propose a regularized cross-entropy loss to drive the model learning process and develop a joint training strategy accordingly. Experimental results demonstrate the effectiveness of the proposed model, both on the small human action recognition dataset of SBU and the currently largest NTU dataset.
- Conference Article
4
- 10.1109/vtcspring.2019.8746345
- Apr 1, 2019
This paper proposes a three-dimensional (3D) non- stationary fixed-to-fixed indoor channel simulator model for human activity recognition. The channel model enables the formulation of temporal variations of the received signal caused by a moving human. The moving human is modelled by a cluster of synchronized moving scatterers. Each of the moving scatterers in a cluster is described by a 3D deterministic trajectory model representing the motion of specific body parts of a person, such as wrists, ankles, head, and waist. We derive the time-variant (TV) Doppler frequencies caused by the motion of each moving scatterer by using the TV angles of motion, angles of arrival, angles of departure. Moreover, we derive the complex channel gain of the received signal. Furthermore, we analyze the TV Doppler power spectral density of the complex channel gain by using the concept of the spectrogram and present its expression in approximated form. Also, we derive the TV mean Doppler shift and TV Doppler spread from the approximated spectrogram. The accuracy of the results is validated by simulations. The channel simulator is beneficial for the development of activity recognition systems with non-wearable devices as the demand for such systems has increased recently.
- Research Article
9
- 10.1016/j.image.2019.115672
- Oct 25, 2019
- Signal Processing: Image Communication
Double-layer conditional random fields model for human action recognition
- Research Article
20
- 10.3390/s21103381
- May 12, 2021
- Sensors
Clinicians lack objective means for monitoring if their knee osteoarthritis patients are improving outside of the clinic (e.g., at home). Previous human activity recognition (HAR) models using wearable sensor data have only used data from healthy people and such models are typically imprecise for people who have medical conditions affecting movement. HAR models designed for people with knee osteoarthritis have classified rehabilitation exercises but not the clinically relevant activities of transitioning from a chair, negotiating stairs and walking, which are commonly monitored for improvement during therapy for this condition. Therefore, it is unknown if a HAR model trained on data from people who have knee osteoarthritis can be accurate in classifying these three clinically relevant activities. Therefore, we collected inertial measurement unit (IMU) data from 18 participants with knee osteoarthritis and trained convolutional neural network models to identify chair, stairs and walking activities, and phases. The model accuracy was 85% at the first level of classification (activity), 89–97% at the second (direction of movement) and 60–67% at the third level (phase). This study is the first proof-of-concept that an accurate HAR system can be developed using IMU data from people with knee osteoarthritis to classify activities and phases of activities.
- Research Article
363
- 10.1109/tnnls.2019.2927224
- Jul 19, 2019
- IEEE Transactions on Neural Networks and Learning Systems
Recent years have witnessed the success of deep learning methods in human activity recognition (HAR). The longstanding shortage of labeled activity data inherently calls for a plethora of semisupervised learning methods, and one of the most challenging and common issues with semisupervised learning is the imbalanced distribution of labeled data over classes. Although the problem has long existed in broad real-world HAR applications, it is rarely explored in the literature. In this paper, we propose a semisupervised deep model for imbalanced activity recognition from multimodal wearable sensory data. We aim to address not only the challenges of multimodal sensor data (e.g., interperson variability and interclass similarity) but also the limited labeled data and class-imbalance issues simultaneously. In particular, we propose a pattern-balanced semisupervised framework to extract and preserve diverse latent patterns of activities. Furthermore, we exploit the independence of multi-modalities of sensory data and attentively identify salient regions that are indicative of human activities from inputs by our recurrent convolutional attention networks. Our experimental results demonstrate that the proposed model achieves a competitive performance compared to a multitude of state-of-the-art methods, both semisupervised and supervised ones, with 10% labeled training data. The results also show the robustness of our method over imbalanced, small training data sets.
- Book Chapter
2
- 10.1007/978-3-030-33509-0_29
- Oct 20, 2019
This work presents a model for human activity recognition, through an IoT paradigm, using location and movement data, generated from an accelerometer. The activities of five individuals from different age groups were monitored, utilizing IoT devices, using the activities of four of these individuals to train the model and the activities of the remaining individual for test data. For the prediction of the activities, the Extra Trees algorithm was used, where the results of 81.16% accuracy were obtained when only movement data were used, 92.59% when using both movement and location data, and 97.56% when using movement data and synthetic location data.