Multimodal alignment of event and text streams in spiking neural networks for human action recognition
Multimodal alignment of event and text streams in spiking neural networks for human action recognition
- Conference Article
- 10.1109/iccc51575.2020.9345117
- Dec 11, 2020
Recent research on video human action has progressed with the development of 3-demensional deep convolutional networks (3-D ConNets). In particular, spatiotemporal features exhibited improved performance. However, the temporal information, which commonly exists in video, has not been fully exploited in existing 3-D ConNets. In this paper, we propose a novel Residual Non-degenerate Temporal Network (RNTN) for human action recognition, which can exploit sufficiently temporal information from frames. Specially, RNTN mainly consists of residual nondegenerate temporal blocks (RNTB) and 3-D effective channel attention blocks (3D-ECA). In RNTB, the expression of temporal features is enhanced effectively. In 3D-ECA, the potential connection between features was strengthened by channel feature interactive with the adjacent channel features. Our approach provides the state-of-the-art performance on the datasets of UCF-101(98.33%) and HMDB-51(80.04%).
- Research Article
24
- 10.1016/j.patcog.2021.108068
- May 27, 2021
- Pattern Recognition
Weakly-supervised temporal attention 3D network for human action recognition
- Book Chapter
5
- 10.1007/978-3-319-49568-2_47
- Nov 25, 2016
Human action recognition has been a significant topic in the field of computer vision. As deep learning develops, the application of deep neural network in related research is gradually more prevalent. This paper provides a survey of deep neural networks for human action recognition based on skeleton information. The detailed description about each method is explained and several related main datasets are briefly introduced in this paper, all papers are published ranging from 2013 to 2015, which provides an overview of the progress in this area.
- Research Article
3
- 10.57197/jdr-2023-0023
- Aug 18, 2023
- Journal of Disability Research
Aging is related to a decrease in the ability to execute activities of day-to-day routine and decay in physical exercise, which affect mental and physical health. Elderly patients or people can depend on a human activity recognition (HAR) system, which monitors the activity interventions and patterns if any critical event or behavioral changes occur. A HAR system incorporated with the Internet of Things (IoT) environment might allow these people to live independently. While the number of groups of activities and sensor measurements is enormous, the HAR problem could not be resolved deterministically. Hence, machine learning (ML) algorithm was broadly applied for the advancement of the HAR system to find the patterns of human activity from the sensor data. Therefore, this study presents an Optimal Deep Recurrent Neural Networks for Human Activity Recognition (ODRNN-HAR) on Elderly and Disabled Persons technique in the IoT platform. The intension of the ODRNN-HAR approach lies in the recognition and classification of various kinds of human activities in the IoT environment. Primarily, the ODRNN-HAR technique enables IoT devices to collect human activity data and employs Z-score normalization as a preprocessing step. For effectual recognition of human activities, the ODRNN-HAR technique uses the DRNN model. At the final stage, the optimal hyperparameter adjustment of the DRNN model takes place using the mayfly optimization (MFO) algorithm. The result analysis of the ODRNN-HAR algorithm takes place on benchmark HAR dataset, and the outcomes are examined. The comprehensive simulation outcomes highlighted the improved recognition results of the ODRNN-HAR approach in terms of different measures.
- Book Chapter
9
- 10.1007/978-3-030-04167-0_23
- Jan 1, 2018
Recently, convolutional neural networks (CNNs) have been extensively applied for human action recognition in videos with the fusion of appearance and motion information by two-stream network. However, for human action recognition in videos, the performance over still images recognition is so far away because of difficulty in extracting the temporal information. In this paper, we propose a multi-stream architecture with convolutional neural networks for human action recognition in videos to extract more temporal features. We make the three contributions: (a) we present a multi-stream with 3D and 2D convolutional neural networks by using still RGB frames, dense optical flows and gradient maps as the input of networks separately; (b) we propose a novel 3D convolutional neural network with residual blocks, use deep 2D convolutional neural network as the pre-train network which is added attention blocks to extract the major motion information; (c) we fuse the multi-stream networks by weights not only for networks but also for every action category to take advantage of the optimal performance of each network. Our networks are trained and evaluated on the standard video action benchmarks of UCF-101 and HMDB-51 datasets, and result shows that our method achieves considerable and comparable recognition performance to the state-of-the-art.
- Research Article
19
- 10.1109/jbhi.2022.3219364
- Jan 1, 2023
- IEEE Journal of Biomedical and Health Informatics
In recent years, human activity recognition (HAR) technologies in e-health have triggered broad interest. In literature, mainstream works focus on the body's spatial information (i.e. postures) which lacks the interpretation of key bioinformatics associated with movements, limiting the use in applications requiring comprehensively evaluating motion tasks' correctness. To address the issue, in this article, a Wearables-based Multi-column Neural Network (WMNN) for HAR based on multi-sensor fusion and deep learning is presented. Here, the Tai Chi Eight Methods were utilized as an example as in which both postures and muscle activity strengths are significant. The research work was validated by recruiting 14 subjects in total, and we experimentally show 96.9% and 92.5% accuracy for training and testing, for a total of 144 postures and corresponding muscle activities. The method is then provided with a human-machine interface (HMI), which returns users with motion suggestions (i.e. postures and muscle strength). The report demonstrates that the proposed HAR technique can enhance users' self-training efficiency, potentially promoting the development of the HAR area.
- Research Article
8
- 10.1109/jiot.2024.3384872
- Jul 1, 2024
- IEEE Internet of Things Journal
Wireless local-area network (WLAN) sensing offers advantages over other approaches to human activity recognition (HAR) for Internet of Things (IoT) applications, including privacy as well as adaptability to non-line-of-sight scenarios. This is why HAR plays an important role in the upcoming IEEE 802.11bf Wi-Fi standard, which aims to bring the adoption of WLAN sensing to a much larger scale. In this paper, we propose CapsHAR, a model based on capsule networks, which uses channel state information (CSI) from Wi-Fi signals to accurately perform human activity recognition. We evaluate the capability of the model on a variety of datasets, including large and small-scale gestures, as well as compare its performance to a variety of models and approaches. We then extend the CapsHAR model into a distributed architecture in order to eliminate the communication overhead of sending CSI data from multiple access points (AP) to a single server. We propose the use of edge computing to run CapsHAR at each AP separately, then combine the outputs of the models through a Fresnel zone-based voting scheme which makes more efficient use of spatial diversity. Overall, the CapsHAR architecture consistently achieves classification accuracy surpassing that of the state-of-the-art models, demonstrating the viability of capsule networks for reliable HAR in Wi-Fi-based IoT applications.
- Research Article
- 10.1109/tim.2025.3612626
- Jan 1, 2025
- IEEE Transactions on Instrumentation and Measurement
The application of the 3-D radar data cube (RDC) which integrates time, distance and Doppler frequency information for accurate human activity recognition (HAR), has attracted much recent research interest in the field of smart healthcare. However, existing methods often fail to fully exploit the temporal-spatial characteristics and the anisotropic nature of RDC, limiting their performance in HAR. To address these limitations, we propose a new temporal-spatial anisotropic radar data cube network (TSARDC-Net) for HAR. This network utilizes a convolutional neural networks-long short-term memory (CNN-LSTM) architecture to simultaneously extract spatial and temporal features from radar signals, aiming to obtain joint modeling of the temporal-spatial characteristics of human motion. We adopted a unique anisotropic multi-scale convolution (AMSC) module to address the anisotropic spatial distribution characteristics of RDC and enhance feature extraction capability. We also introduced Squeeze and Excitation normalization (SENM) to adjust the learned features, thereby improving the model’s ability to recognize action features. Furthermore, considering practical deployment requirements, we explored a lightweight strategy based on separable convolutions. We used a public dataset which includes 1,754 samples, recording 6 different human activities. In addition, we recruited a group of volunteers using an off-the-shelf WiFi radar device and obtained a dataset containing 2,148 samples of 5 different activities. TSARDC-Nets were trained separately on these two datasets. Experimental results show that, on the public dataset, the proposed method achieves a classification accuracy of 98.58%, outperforming existing methods. Additionally, the proposed method achieves an accuracy of 95.57% on our dataset, showing good generalization capability.
- Book Chapter
5
- 10.1007/978-3-319-73008-0_43
- Jan 1, 2018
We investigate distributing convolutional neural networks (CNNs) for human activity recognition across computing nodes collocated with sensors at specific regions (body, arms and legs) on the wearer. We compare four CNN architectures. A distributed CNN is implemented on a network of Intel Edison nodes, demonstrating the capability of performing real-time classification. Two use a centralized, monolithic approach, and two are distributed across a number of computing nodes. While the accuracy of the distributed approaches are slightly worse than those of the monolithic CNNs, exploiting the hierarchy of the problem turns out to require much less memory — and therefore computation — than the monolithic CNNs, and only modest communication rates between nodes in the model, making the approach viable for a wide range of distributed systems ranging from wearable robots to multi-robot swarms.
- Conference Article
33
- 10.1145/3389189.3397991
- Jun 26, 2020
Due to rising cost of social care, the number of older adults who prefer to live independently in their own home has increased. The independent lifestyle cannot be achieved if the elderly user suffers from mild cognitive impairment unless a suitable assistive environment is provided to monitor and recognise the daily activities. Different techniques are employed for gathering data representing the user's activities. Available systems with wearable sensors or camera devices are undesirable to many users due to privacy issues. This paper proposes the use of a Deep Convolutional Neural Network (DCNN) for human activity recognition using binary ambient sensors such as Passive Infrared (PIR) and door sensors. Each activity is represented as a binary string converted into a greyscale image. Uncorrelated features are selected and they are then used as inputs to an Adaptive Boosting (AdaBoost) and Fuzzy C-means (FCM) classifiers for recognising Activities of Daily Living (ADL). The performance of the proposed model is evaluated using a dataset representing the ADL for a single user. The achieved results using the extracted features from the greyscale image representing ADL with AdaBoost and FCM algorithms are 99.5% and 86.4%, respectively.
- Research Article
- 10.52783/pst.1940
- May 28, 2025
- Power System Technology
Over the previous span, Human Activity Recognition (HAR) has evolved into a critical research area within the computer vision, driven by advancements in video-based action recognition techniques. Unlike image-based methods, video-based HAR leverages Spatial and Temporal information, offering a richer understanding of human behaviors. This area has found several applications in diverse domains, including education, intelligent surveillance, healthcare, entertainment, and autonomous systems. The cameras and sensing devices: There is an increasing demand for automated HAR systems utilizing computationally intelligent methods such as Deep learning (DL) and Machine Learning (ML). This paper delivers a detail study of DL and ML techniques applied to HAR between 2014 and 2025. It explores various modalities used for action recognition, including RGB-D cameras, audio, and inertial sensors, and examines their roles in enhancing HAR performance. A detailed analysis of public datasets is presented, highlighting their characteristics, strengths, and limitations. Additionally, this survey explores into how action representation, dimensionality reduction, and actually action analysis methods, identifying their respective advantages and drawbacks. In this paper discusses applications of HAR, including human-computer interaction, remote health monitoring, virtual reality, and abnormal behavior detection, emphasizing its transformative impact on these fields. Key challenges, such as scalability, real-time processing, and environmental variability, are outlined, along with the future research directions aimed at developing robust and efficient HAR systems. This survey serves as a valuable resource for researchers and practitioners, providing insights into the state-of-the-art techniques and to make it easier for further advancements in HAR. DOI : https://doi.org/10.52783/pst.1940
- Research Article
135
- 10.1109/tai.2021.3076974
- Apr 1, 2021
- IEEE Transactions on Artificial Intelligence
Video-based human action recognition is one of the most important and challenging areas of research in the field of computer vision. Human action recognition has found many pragmatic applications in video surveillance, human-computer interaction, entertainment, autonomous driving, etc. Owing to the recent development of deep learning methods for human action recognition, the performance of action recognition has significantly enhanced for challenging datasets. Deep learning techniques are mainly used for recognizing actions in images and videos comprising of Euclidean data. A recent development in deep learning methods is the extension of these techniques to non-Euclidean data or graph data with many nodes and edges. Human body skeleton resembles a graph, therefore, the graph convolutional network (GCN) is applicable to the non-Euclidean body skeleton. In the past few years, GCN has emerged as an important tool for skeleton-based action recognition. Therefore, we conduct a survey using GCN methods for action recognition. Herein, we present a comprehensive overview of recent GCN techniques for action recognition, propose a taxonomy for the categorization of GCN techniques for action recognition, carry out a detailed study of the benchmark datasets, enlist relevant resources and open-source codes, and finally provide an outline for future research directions and trends. To the best of authors' knowledge, this is the first survey for action recognition using GCN techniques.
- Research Article
51
- 10.1109/access.2019.2962284
- Jan 1, 2020
- IEEE Access
One of challenging tasks in the field of artificial intelligence is the human action recognition. In this paper, we propose a novel long-term temporal feature learning architecture for recognizing human action in video, named Pseudo Recurrent Residual Neural Networks (P-RRNNs), which exploits the recurrent architecture and composes each in different connection among units. Two-stream CNNs model (GoogLeNet) is employed for extracting local temporal and spatial features respectively. The local spatial and temporal features are then integrated into global long-term temporal features by using our proposed two-stream P-RRNNs. Finally, the Softmax layer fuses the outputs of two-stream P-RRNNs for action recognition. The experimental results on two standard databases UCF101 and HMDB51 demonstrate the outstanding performance of proposed method based on architectures for human action recognition.
- Research Article
113
- 10.1016/j.jmsy.2020.04.007
- Apr 29, 2020
- Journal of Manufacturing Systems
Transferable two-stream convolutional neural network for human action recognition
- Research Article
9
- 10.1016/j.patrec.2021.08.017
- Nov 1, 2021
- Pattern Recognition Letters
Learning Video Actions in Two Stream Recurrent Neural Network