Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

A data-centric approach to radar-based human action recognition: SVD-based clutter removal and RTM/DTM feature fusion

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

A data-centric approach to radar-based human action recognition: SVD-based clutter removal and RTM/DTM feature fusion

Similar Papers
  • Supplementary Content
  • 10.4225/03/58a67f1c7cd0a
Visual cues for view-invariant human action recognition
  • Feb 17, 2017
  • Figshare
  • N A Anwaar-Ul-Haq

Human action is a visually complex phenomenon. Visual representation, analysis and recognition of human actions has become a key focus of research in computer vision, artificial intelligence, robotics and other related scientific disciplines. Various applications of automated action recognition include but not limited to intelligent health care monitoring, smart-homes, content based video search, animation and entertainment, human-computer interaction and intelligent video surveillance. The main focus of all these application areas surrounds a fundamental question: Given a human subject doing something in the field of sensory input, what is the person doing? If machine is able to correctly answer this question, it can greatly benefit computer vision system development and practical usage. However, machine recognition of human action is a daunting task due to complex motion dynamics, anthropometric variations, occlusion and high dependency over camera viewpoint. In this thesis, we exploit the importance of rich visual cues from human actions and utilize them to propose valuable solutions to human action recognition. The important problem of view-invariance under viewpoint variations is taken as a case study. We collect and explore these visual cues from geometrical relationships, spatio-temporal patterns and features, frequency domain signal analysis, contextual associations of actions and derive action representations for machine recognition. Actions are known as spatio-temporal patterns and temporal order plays an important role in their interpretations. We, therefore, explore invariance property of temporal order of actions during action execution and utilize it for devising a new view-invariant action recognition approach. We apply order constraint and feature fusion on local spatiotemporal features. These features are representation of choice for action recognition due to their computational simplicity, robustness to occlusion and minor view-point changes. We introduce STOPs (spatio-temporal ordered packets) that combine discriminative characteristics of multiple features for better recognition performance. In addition, we introduce spatio-temporal ordering constraint that removes discrepancy of orderless formation of bag-of-feature framework for action recognition. Furthermore, to deal with limitations of feature based approaches, we explore multiple view geometry which has alleviated various complex problems in computer vision. We thoroughly study applications of static and multi-body flow fundamental matrix in context of relating across-view information. We introduce spatio-temporally consistent dense optical flow to avoid explicit manual human body landmark point detection and explicit point correspondences. We employ rank constraint to derive novel tracking and training-free action similarity measures across viewpoint variations. Next, we investigate that despite the considerable success of geometrical techniques, computational complexity due to dense optical flow calculations plays a hindering role. Therefore, we study and track frequency domain analysis of action sequences. It leads toward the derivation of spatio-temporal correlation filters that use frequency domain filtering to give fast and efficient solutions to action recognition. However, these filters are originally view-dependent solutions. To achieve this objective, view clustering is explored that extends frequency domain techniques to achieve view-invariance. Contextual information is another important cue for interpreting human actions especially when actions exhibit interactive relationships with their context. These contextual clues become even more crucial when videos are captured in unfavorable conditions like extreme low light nighttime scenarios. We, therefore, take case study of night vision and present contextual action recognition at nighttime. We discover that context enhancement is imperative in such challenging multi-sensor environment to achieve reliable action recognition which leads us to develop novel context enhancement techniques for night vision using multi-sensor image fusion. Extensive experimentation on well-known action datasets is performed and results are compared with the existing action recognition approaches in literature. The research findings in this thesis greatly encourage the exploitation of spatia-temporal visual cues for deriving novel action recognition approaches and increasing their performance.

  • Research Article
  • Cite Count Icon 117
  • 10.1016/j.neucom.2020.06.032
Human action recognition using convolutional LSTM and fully-connected LSTM with different attentions
  • Jun 16, 2020
  • Neurocomputing
  • Zufan Zhang + 3 more

Human action recognition using convolutional LSTM and fully-connected LSTM with different attentions

  • Conference Article
  • Cite Count Icon 3
  • 10.1109/icet51757.2021.9451065
Personalized Human Activity Recognition using Hypergraph Learning with Fusion Features
  • May 7, 2021
  • Luqi Wang + 5 more

Human activity recognition (HAR) is a promising field which has a wide range of applications in medicine, electronic forensics and Internet of Things. Until now, existing works generally focus on artificial extraction of statistical features or applying deep learning to extract deep features to perform activity recognition. However, most studies neglect individual differences among different users, which leads to performance decline of the trained model when applied to new users. In this paper, we propose a novel approach using hypergraph learning for personalized human activity recognition based on fusion features. Fusion features take advantage of deep features and statistical features, and also fuse the user's personalization factors to reduce the influence of individual differences. In the classification part, a hypergraph learning algorithm is used to recognize user's activities based on the fusion features. Experiments on the public dataset USC-HAD (11-class) and the self-collected dataset (6-class) show the proposed method has superior performance compared to existing methods, as well as greater potential for usage in personalized human activity recognition during daily lives.

  • Conference Article
  • 10.54941/ahfe1005815
An Action Recognition Method based on 3D Feature Fusion
  • Jan 1, 2025
  • AHFE international
  • Yinhao Xu + 1 more

Video, which is distinct from a simple image, encompasses both spatial and temporal dimensions. In the spatial dimension, it contains various visual elements similar to those in static images. However, the addition of the temporal dimension makes it far more complex. It includes static image features such as color, texture, shape, and edge information that are crucial for identifying objects within each frame. Moreover, motion features play a significant role as they describe the movement of objects over time, including velocity, acceleration, and direction of movement. Additionally, external features like lighting conditions, background clutter, and occlusions also affect the overall nature of the video.As an important branch within the broad field of video understanding, human action recognition has attracted widespread attention from the research community and industries alike. The ability to accurately recognize human actions in videos has numerous applications, ranging from surveillance systems to human computer interaction, sports analysis, and entertainment.At present, there are three mainstream methods for processing video data, especially for action recognition: C3D, two stream network, and (2+1) D Net.SlowFast is a typical variant of C3D.The core of SlowFast is to process videos using two channels. These two channels are named Slow pathway and Fast pathway respectively. Compared with Fast pathway, Slow pathway has a relatively lower frame rate but has a greater number of channels. Slow pathway is used to capture semantic information in space, that is, Slow pathway captures the relatively static information in the video.While Fast pathway has a higher frame rate but a smaller number of channels. This greatly reduces the computational complexity of Fast. At the same time, it weakens Fast's ability to model spatial information and makes it pay more attention to information with obvious changes in the temporal dimension.Slow pathway and Fast pathway do not exist independently. The information fusion between the two is unidirectional information fusion. The two achieve information fusion through multiple lateral connections. And the direction of the lateral connections is from Fast to Slow. This means that Fast pathway will not receive any information about Slow pathway. This will undoubtedly lose some semantic information that describes space. We believe that adopting a more effective feature fusion method can further improve the recognition accuracy.Based on the well-known two branch network SlowFast, this paper introduces a significant improvement. Specifically, we propose an enhanced SlowFast network named ESL Net. A key innovation in this network is the addition of an improved 3D feature fusion module. This module is designed to make the most of the temporal information available in the video for effective feature fusion. It employs temporal and spatial attention mechanisms to precisely identify the most significant parts of the features. By analyzing the temporal information, it can also determine the crucial elements between dual - temporal features. Extensive experiments have demonstrated that our proposed method is highly effective when applied to the UCF-101 dataset and the HMDB51 dataset, showing superior performance compared to existing methods especially SlowFast Network in terms of accuracy and robustness in human action recognition tasks.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 1
  • 10.17485/ijst/v17i21.3203
Multi-dimensional CNN Based Feature Extraction with Feature Fusion and SVM for Human Activity Recognition in Surveillance Videos
  • May 25, 2024
  • Indian Journal Of Science And Technology
  • Hetal Shah + 1 more

Background/Objectives: The accurate recognition of human activities from video sequences is very challenging due to low resolution, cluttered background, partial occlusion, and different viewpoints. Machine learning (ML) based automated HAR from surveillance videos is required with the fusion of various feature extraction techniques. Methods: In this paper, SVM with feature fusion is utilized for automatic recognition from surveillance videos. A Histogram of Oriented Gradient (HOG) is used to segment the frame to differentiate humans from other objects or background noise in the input video frames. The multi-feature extraction can be accomplished in terms of Gabor Wavelet Transform (GWT), Autocorrelogram, Gray-Level Co-Occurrence Matrix (GLCM), HSV histogram, and Multi-dimensional CNN. The proposed approach is implemented in MATLAB software and compared with existing approaches like Space-Time Interest Point (STIP) and Histogram of Optical Flow (HOF). Findings: The proposed approach outperforms the existing approaches in terms of reduced time consumption and high accuracy, 99.886% when using the UCF101 dataset and 99.538% when using the UTKinect dataset. Novelty: The most discriminative feature information is obtained with the feature-level fusion technique. From the feature information, various human actions are recognized with the classification algorithm. Keywords: Human activity recognition, Machine Learning, Surveillance Videos, Human detection algorithm, Feature extraction, SVM classifier

  • Book Chapter
  • Cite Count Icon 4
  • 10.1007/978-3-030-15235-2_81
Human Action Recognition Based on Fusion Features
  • Apr 25, 2019
  • Shiqiang Yang + 4 more

Human action recognition has a wide range of application prospects in areas such as artificial intelligence and human-computer interaction. Action feature models and action recognition models are the basis of human action recognition. Based on the simplification of human skeleton model, the complementary features information such as the main joint angle, speed and relative position of the human body joint are extracted and fused to describe the behavioral gestures. And the action is expressed with the gesture series. A behavioral action model is established. In order to facilitate calculating, Fourier interpolation is performed on each action sample in the action database which taking the most characteristic dimension of the action video as the standard to keep the action samples feature dimensions consistent and normalized. And the principal components are used to extracting the main components of the feature, reducing the feature dimensions and redundant information. A one-to-many multi-category action recognition model was established based on the theory of support vector machines. The action recognition experiment was carried out with the open human action video database. The results showed that the algorithm has good adaptability and practicality.

  • Research Article
  • Cite Count Icon 32
  • 10.1016/j.cogsys.2019.05.002
Human action recognition from RGB-D data using complete local binary pattern
  • May 15, 2019
  • Cognitive Systems Research
  • S Arivazhagan + 3 more

Human action recognition from RGB-D data using complete local binary pattern

  • Research Article
  • 10.1631/eng.itee.2025.0177
An Attention Mechanism-Based Multi-Domain Feature Fusion Approach for Active Sonar Target Recognition
  • Feb 1, 2026
  • ENGINEERING Information Technology & Electronic Engineering
  • Tongjing Sun + 3 more

Due to the complex and changeable marine environment, the active sonar target recognition problem has always been difficult in the field of underwater acoustics. Deep learning-based fusion recognition technology provides an effective way to solve this problem, but relying on simple concatenation strategies to fuse multi-domain features can cause information redundancy, and it is not easy to effectively mine correlation information between domains. Therefore, this paper proposes an attention mechanism-based multi-domain feature fusion approach for active sonar target recognition. By preprocessing active sonar echo signals and constructing a multi-domain feature extraction and fusion network, this method uses a one-dimensional convolutional neural network with long short-term memory (1DCNN-LSTM) and a two-dimensional convolutional neural network (2DCNN) with channel attention introduced to extract deep features from different domains. Subsequently, combining feature concatenation and constructing multi-domain cross-attention, intra- and cross-domain feature fusion is performed, which can effectively eliminate redundant information and promote inter-domain information interaction, while maximizing the retention of target features. Experimental results show that compared with single-domain methods, the network using an attention mechanism for multi-domain feature fusion strengthens cross-domain information interaction and significantly improves feature representation capability. Compared with other methods, the proposed method has obvious advantages in performance and maintains stable generalization ability in scenarios with low signal-clutter ratios.

  • Research Article
  • Cite Count Icon 7
  • 10.3233/jifs-233498
Hybrid optimized multimodal spatiotemporal feature fusion for vision-based sports activity recognition
  • Jan 10, 2024
  • Journal of Intelligent & Fuzzy Systems
  • M Amsaprabhaa

Vision-based Human Activity Recognition (HAR) is a challenging research task in sports. This paper aims to track the player’s movements and recognize the different types of sports activities in videos. The proposed work aims in developing Hybrid Optimized Multimodal SpatioTemporal Feature Fusion (HOM-STFF) model using skeletal information for vision-based sports activity recognition. The proposed HOM-STFF model presents a deep multimodal feature fusion approach that combines the features that are generated from the multichannel-1DCNN and 2D-CNN network model using a concatenative feature fusion process. The fused features are fed into the 2-GRU model that generates temporal features for activity recognition. Nature-inspired Bald Eagle Search Optimizer (BESO) is applied to optimize the network weights during training. Finally, performance of the classification model is evaluated and compared for identifying different activities in sports videos. Experimentation was carried out with the three vision-based sports datasets namely, Sports Videos in the Wild (SVW), UCF50 sports action and Self-build dataset, which achieved accuracy rate of 0.9813, 0.9506 and 0.9733, respectively. The results indicate that the proposed HOM-STFF model outperforms the other state-of-the-art methods in terms of activity detection capability.

  • Research Article
  • Cite Count Icon 256
  • 10.1007/s11042-020-08806-9
Human action recognition using fusion of multiview and deep features: an application to video surveillance
  • Mar 14, 2020
  • Multimedia Tools and Applications
  • Muhammad Attique Khan + 6 more

Human Action Recognition (HAR) has become one of the most active research area in the domain of artificial intelligence, due to various applications such as video surveillance. The wide range of variations among human actions in daily life makes the recognition process more difficult. In this article, a new fully automated scheme is proposed for Human action recognition by fusion of deep neural network (DNN) and multiview features. The DNN features are initially extracted by employing a pre-trained CNN model name VGG19. Subsequently, multiview features are computed from horizontal and vertical gradients, along with vertical directional features. Afterwards, all features are combined in order to select the best features. The best features are selected by employing three parameters i.e. relative entropy, mutual information, and strong correlation coefficient (SCC). Furthermore, these parameters are used for selection of best subset of features through a higher probability based threshold function. The final selected features are provided to Naive Bayes classifier for final recognition. The proposed scheme is tested on five datasets name HMDB51, UCF Sports, YouTube, IXMAS, and KTH and the achieved accuracy were 93.7%, 98%, 99.4%, 95.2%, and 97%, respectively. Lastly, the proposed method in this article is compared with existing techniques. The resuls shows that the proposed scheme outperforms the state of the art methods.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 15
  • 10.1007/s00521-021-06239-5
Local-aware spatio-temporal attention network with multi-stage feature fusion for human action recognition
  • Jul 11, 2021
  • Neural Computing and Applications
  • Yaqing Hou + 6 more

In the study of human action recognition, two-stream networks have made excellent progress recently. However, there remain challenges in distinguishing similar human actions in videos. This paper proposes a novel local-aware spatio-temporal attention network with multi-stage feature fusion based on compact bilinear pooling for human action recognition. To elaborate, taking two-stream networks as our essential backbones, the spatial network first employs multiple spatial transformer networks in a parallel manner to locate the discriminative regions related to human actions. Then, we perform feature fusion between the local and global features to enhance the human action representation. Furthermore, the output of the spatial network and the temporal information are fused at a particular layer to learn the pixel-wise correspondences. After that, we bring together three outputs to generate the global descriptors of human actions. To verify the efficacy of the proposed approach, comparison experiments are conducted with the traditional hand-engineered IDT algorithms, the classical machine learning methods (i.e., SVM) and the state-of-the-art deep learning methods (i.e., spatio-temporal multiplier networks). According to the results, our approach is reported to obtain the best performance among existing works, with the accuracy of 95.3% and 72.9% on UCF101 and HMDB51, respectively. The experimental results thus demonstrate the superiority and significance of the proposed architecture in solving the task of human action recognition.

  • Research Article
  • Cite Count Icon 30
  • 10.1016/j.inffus.2023.102211
Human centric attention with deep multiscale feature fusion framework for activity recognition in Internet of Medical Things
  • Dec 28, 2023
  • Information Fusion
  • Altaf Hussain + 4 more

Human centric attention with deep multiscale feature fusion framework for activity recognition in Internet of Medical Things

  • Research Article
  • Cite Count Icon 10
  • 10.1007/s11042-017-5165-0
Discriminative multi-task multi-view feature selection and fusion for multimedia analysis
  • Sep 6, 2017
  • Multimedia Tools and Applications
  • Ziwei Yang + 3 more

Multimedia content analysis and understanding, such as action recognition and image classification, is a fundamental research problem. One effective strategy to improve the performance is designing discriminative visual representation, for example combining multiple feature sets for representation. However, simply combing these features may cause high dimensionality and lead to noises. Feature selection and fusion are common choices for multiple feature representation. At the same time, multi-task feature learning has been proven to be an effective method by many researches. In this paper, we propose a multi-task multi-view feature selection and fusion method which chooses and fuses discriminative features. For discriminative feature selection, we learn the selection matrix W by the minimization of the trace ratio objective function. For multiple tasks measurement, we employ the l 2,1-norm regularization to solve single task and share information among tasks. For multiple feature fusion, we incorporate local structures of each view in the Laplacian matrix. Since the Laplacian matrix is constructed in unsupervised manner and scaled category indicator matrix is solved iteratively, our work is fully unsupervised. Experimental results on four action recognition datasets and five image classification datasets demonstrate the effectiveness of multi-task multi-view feature selection and fusion.

  • Conference Article
  • Cite Count Icon 7
  • 10.1109/icme.2015.7177432
Discriminative multi-view feature selection and fusion
  • Jun 1, 2015
  • Yanbin Liu + 2 more

In computer vision tasks such as action recognition and image classification, combining multiple visual feature sets is proven to be an effective strategy. However, simply combing these features may cause high dimensionality and lead to noises. Feature selection and fusion are common choices for multiple feature representation. In this paper, we propose a multi-view feature selection and fusion method which chooses and fuses discriminative features from multiple feature sets. For discriminative feature selection, we learn the selection matrix W by the minimization of the trace ratio objective function with l 2,1 norm regularization. For multiple feature fusion, we incorporate local structures of each view in the Laplacian matrix. Since the Laplacian matrix is constructed in unsupervised manner and scaled category indicator matrix is solved iteratively, our work is fully unsupervised. Experimental results on four action recognition datasets and two large-scale image classification datasets demonstrate the effectiveness of multi-view feature selection and fusion.

  • Research Article
  • Cite Count Icon 22
  • 10.1016/j.image.2020.115802
Human action recognition toward massive-scale sport sceneries based on deep multi-model feature fusion
  • Jan 23, 2020
  • Signal Processing: Image Communication
  • Ersan Zhou + 1 more

Human action recognition toward massive-scale sport sceneries based on deep multi-model feature fusion

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant