Advancing Human Action Recognition: Wavelet-DTW Enhanced Deep Learning with Multi-Head Attention

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Advancing Human Action Recognition: Wavelet-DTW Enhanced Deep Learning with Multi-Head Attention

Similar Papers
  • Research Article
  • Cite Count Icon 20
  • 10.1093/database/baz054
Chemical-protein interaction extraction via contextualized word representations and multihead attention.
  • Jan 1, 2019
  • Database
  • Yijia Zhang + 4 more

A rich source of chemical–protein interactions (CPIs) is locked in the exponentially growing biomedical literature. Automatic extraction of CPIs is a crucial task in biomedical natural language processing (NLP), which has great benefits for pharmacological and clinical research. Deep context representation and multihead attention are recent developments in deep learning and have shown their potential in some NLP tasks. Unlike traditional word embedding, deep context representation has the ability to generate comprehensive sentence representation based on the sentence context. The multihead attention mechanism can effectively learn the important features from different heads and emphasize the relatively important features. Integrating deep context representation and multihead attention with a neural network-based model may improve CPI extraction. We present a deep neural model for CPI extraction based on deep context representation and multihead attention. Our model mainly consists of the following three parts: a deep context representation layer, a bidirectional long short-term memory networks (Bi-LSTMs) layer and a multihead attention layer. The deep context representation is employed to provide more comprehensive feature input for Bi-LSTMs. The multihead attention can effectively emphasize the important part of the Bi-LSTMs output. We evaluated our method on the public ChemProt corpus. These experimental results show that both deep context representation and multihead attention are helpful in CPI extraction. Our method can compete with other state-of-the-art methods on ChemProt corpus.

  • Research Article
  • Cite Count Icon 1
  • 10.3233/thc-241064
Revolutionizing health monitoring: Integrating transformer models with multi-head attention for precise human activity recognition using wearable devices.
  • Jan 1, 2025
  • Technology and health care : official journal of the European Society for Engineering and Medicine
  • Anandhavalli Muniasamy

A daily activity routine is vital for overall health and well-being, supporting physical and mental fitness. Consistent physical activity is linked to a multitude of benefits for the body, mind, and emotions, playing a key role in raising a healthy lifestyle. The use of wearable devices has become essential in the realm of health and fitness, facilitating the monitoring of daily activities. While convolutional neural networks (CNN) have proven effective, challenges remain in quickly adapting to a variety of activities. This study aimed to develop a model for precise recognition of human activities to revolutionize health monitoring by integrating transformer models with multi-head attention for precise human activity recognition using wearable devices. The Human Activity Recognition (HAR) algorithm uses deep learning to classify human activities using spectrogram data. It uses a pretrained convolution neural network (CNN) with a MobileNetV2 model to extract features, a dense residual transformer network (DRTN), and a multi-head multi-level attention architecture (MH-MLA) to capture time-related patterns. The model then blends information from both layers through an adaptive attention mechanism and uses a SoftMax function to provide classification probabilities for various human activities. The integrated approach, combining pretrained CNN with transformer models to create a thorough and effective system for recognizing human activities from spectrogram data, outperformed these methods in various datasets - HARTH, KU-HAR, and HuGaDB produced accuracies of 92.81%, 97.98%, and 95.32%, respectively. This suggests that the integration of diverse methodologies yields good results in capturing nuanced human activities across different activities. The comparison analysis showed that the integrated system consistently performs better for dynamic human activity recognition datasets. In conclusion, maintaining a routine of daily activities is crucial for overall health and well-being. Regular physical activity contributes substantially to a healthy lifestyle, benefiting both the body and the mind. The integration of wearable devices has simplified the monitoring of daily routines. This research introduces an innovative approach to human activity recognition, combining the CNN model with a dense residual transformer network (DRTN) with multi-head multi-level attention (MH-MLA) within the transformer architecture to enhance its capability.

  • Research Article
  • 10.11591/ijai.v13.i4.pp4747-4756
Sentiment analysis of student’s comments using long short-term memory with multi head attention
  • Dec 1, 2024
  • IAES International Journal of Artificial Intelligence (IJ-AI)
  • Bhavana Prasanjeet Bhagat + 2 more

<a name="_Hlk156758974"></a><span lang="EN-US">Classroom teaching is a viable and effective approach for enhancing student learning and promoting engagement in the educational process. The opinions of students play a vital role in the evaluation of teachers. This paper presents a comprehensive overview of sentiment analysis techniques based on recent research and subsequently explores machine learning, i.e., ensemble classifiers, deep learning, long short-term memory (LSTM), convolutional neural network (CNN), LSTM with single attention, LSTM with multi-head attention, and feature extraction techniques (TFidfVector and Word2Vec), in the context of sentiment analysis over student opinion datasets, i.e., the Vietnamese student feedback corpus, as well as data collected from a final-year student's comment in 2023. Further, the Vietnamese student feedback corpus is translated to English and pre-processed with the proposed framework, which yields interesting facts about the capabilities and deficiencies of different methods. In this paper, we conducted experiments with ensemble classifiers, LSTM and CNN, LSTM with single attention, and LSTM with multi-head attention. We conclude that LSTM with multi-head attention produces an accuracy result of 95.57%, which outperform as compare to other three baseline methods and earlier studies.</span>

  • Research Article
  • 10.1007/s10462-025-11115-y
Word embedding factor based multi-head attention
  • Jan 30, 2025
  • Artificial Intelligence Review
  • Zhengren Li + 4 more

The natural language processing (NLP) field has made significant progress using deep learning models based on multi-head attention mechanisms, such as Transformer and BERT. However, there are two major limitations to this approach. First, the number of heads is often manually set based on empirical experience, and second, it is not clear enough in semantic understanding and interpretation. In this study, we propose a novel attention mechanism called Factor Analysis-based Multi-head (FAM) Attention, which combines the theory of explorative factor analysis and word embedding. The experimental results demonstrate that FAM Attention achieves better performance and requires fewer parameters compared to traditional methods while also having better semantic understanding ability and interpretability at the token level. This also has significant implications for current Large Language Models (LLMs), particularly in terms of effectively reducing parameter counts and enhancing performance.

  • Conference Article
  • Cite Count Icon 13
  • 10.1109/icassp39728.2021.9414877
Double Multi-Head Attention for Speaker Verification
  • Jun 6, 2021
  • Miquel India + 2 more

Most state-of-the-art Deep Learning systems for text-independent speaker verification are based on speaker embedding extractors. These architectures are commonly composed of a feature extractor front-end together with a pooling layer to encode variable-length utterances into fixed-length speaker vectors. In this paper we present Double Multi-Head Attention (MHA) pooling, which extends our previous approach based on Self MHA. An additional self attention layer is added to the pooling layer that summarizes the context vectors produced by MHA into a unique speaker representation. This method enhances the pooling mechanism by giving weights to the information captured for each head and it results in creating more discriminative speaker embeddings. We have evaluated our approach with the VoxCeleb2 dataset. Our results show 6.09% and 5.23% relative improvement in terms of EER compared to Self Attention pooling and Self MHA, respectively. According to the obtained results, Double MHA has shown to be an excellent approach to efficiently select the most relevant features captured by the CNN-based front-ends from the speech signal.

  • Research Article
  • Cite Count Icon 52
  • 10.1016/j.jenvman.2023.117759
Accurate multi-objective prediction of CO2 emission performance indexes and industrial structure optimization using multihead attention-based convolutional neural network
  • Mar 21, 2023
  • Journal of Environmental Management
  • Fenger Wu + 4 more

Accurate multi-objective prediction of CO2 emission performance indexes and industrial structure optimization using multihead attention-based convolutional neural network

  • Research Article
  • 10.31127/tuje.1695208
Comparative Study of BiGRU with Multi-Head Attention and CNN for Network Intrusion Detection Using a Cleaned and Balanced CSE-CIC-IDS 2018 Dataset
  • Oct 8, 2025
  • Turkish Journal of Engineering
  • Suresh Kumar Balasubramanian + 1 more

With the age of advanced cyber attacks, robust intrusion detection systems are inevitable in order to protect the network from insecurity. This work presents a new comparative performance evaluation of two deep learning models, namely, Bidirectional Gated Recurrent Unit with Multi Head Attention (BiGRU + MHA) and Convolutional Neural Network (CNN), on the updated CSE-CIC-IDS 2018 dataset (Version 1, 2024). The data set was cleaned and balanced meticulously by eliminating duplicate entries and a two-stage resampling method with random undersampling accompanied with synthetic minority oversampling for accurate representation of both frequent as well as infrequent types of attacks. The experimental results confirm that both models provided superior detection performance, with BiGRU + MHA consistently outperforming CNN. Specifically, BiGRU + MHA provided 99.65 percent accuracy as well as ROC AUC of 99.71 percent, whereas CNN provided 98.85 percent accuracy as well as ROC AUC of 98.92 percent. The observations identify the advantage of using the combination of temporal sequence modeling as well as attention for identifying advanced intrusion patterns in network traffic. Generally, the results confirm that the use of deep temporal learning in combination with structured preparation of the data holds the capability for leading to highly effective intrusion detection, with great potential for strengthening cybersecurity solutions.

  • Research Article
  • Cite Count Icon 192
  • 10.1007/s12652-020-02761-x
Improving time series forecasting using LSTM and attention models
  • Jan 3, 2021
  • Journal of Ambient Intelligence and Humanized Computing
  • Hossein Abbasimehr + 1 more

Accurate time series forecasting has been recognized as an essential task in many application domains. Real-world time series data often consist of non-linear patterns with complexities that prevent conventional forecasting techniques from accurate predictions. To forecast a given time series accurately, a hybrid model based on two deep learning methods, i.e., long short-term memory (LSTM) and multi-head attention is proposed in this study. The proposed method leverages the two learned representations from these techniques. The performance of this method is also compared with some standard time series forecasting techniques as well as some hybrid cases proposed in the related literature using 16 datasets. Moreover, the individual models based on LSTM and multi-head attention are implemented to perform a comprehensive evaluation. The results of experiments in this study indicate that the proposed model outperforms all benchmarking methods in most datasets in terms of symmetric mean absolute percentage error (SMAPE). It yields the best average rank (AR) among the utilized methods. Besides, the results reveal that model based on multi-head attention is the second-best method with regard to AR, which demonstrates the predictive power of attention mechanism in time series forecasting.

  • Conference Article
  • Cite Count Icon 76
  • 10.1109/icassp40776.2020.9054073
Multi-Head Attention for Speech Emotion Recognition with Auxiliary Learning of Gender Recognition
  • May 1, 2020
  • Anish Nediyanchath + 2 more

The paper presents a Multi-Head Attention deep learning network for Speech Emotion Recognition (SER) using Log mel-Filter Bank Energies (LFBE) spectral features as the input. The multi-head attention along with the position embedding jointly attends to information from different representations of the same LFBE input sequence. The position embedding helps in attending to the dominant emotion features by identifying positions of the features in the sequence. In addition to Multi-Head Attention and position embedding, we apply multi-task learning with gender recognition as an auxiliary task. The auxiliary task helps in learning the gender specific features that influence the emotion characteristics in speech and results in improved accuracy of Speech Emotion Recognition, the primary task. We conducted all our experiments on IEMOCAP dataset. We are able to achieve an overall accuracy of 76.4% and average class accuracy of 70.1%, which are 5.3% and 6.2% higher respectively than the state-of-the-art models available on SER for four emotion classes.

  • Research Article
  • Cite Count Icon 72
  • 10.1016/j.specom.2020.10.004
Masked multi-head self-attention for causal speech enhancement
  • Oct 29, 2020
  • Speech Communication
  • Aaron Nicolson + 1 more

Masked multi-head self-attention for causal speech enhancement

  • Research Article
  • Cite Count Icon 21
  • 10.1016/j.ins.2023.03.058
A novel two-level interactive action recognition model based on inertial data fusion
  • Mar 15, 2023
  • Information Sciences
  • Sen Qiu + 7 more

A novel two-level interactive action recognition model based on inertial data fusion

  • Research Article
  • Cite Count Icon 8
  • 10.1016/j.eswa.2024.123674
Modeling vehicle U-turning behavior near intersections: A deep learning approach based on TCN and multi-head attention
  • Mar 17, 2024
  • Expert Systems with Applications
  • Weiliang Zeng + 4 more

Modeling vehicle U-turning behavior near intersections: A deep learning approach based on TCN and multi-head attention

  • Research Article
  • Cite Count Icon 23
  • 10.1186/s12859-023-05447-1
MCL-DTI: using drug multimodal information and bi-directional cross-attention learning method for predicting drug–target interaction
  • Aug 26, 2023
  • BMC Bioinformatics
  • Ying Qian + 3 more

BackgroundPrediction of drug–target interaction (DTI) is an essential step for drug discovery and drug reposition. Traditional methods are mostly time-consuming and labor-intensive, and deep learning-based methods address these limitations and are applied to engineering. Most of the current deep learning methods employ representation learning of unimodal information such as SMILES sequences, molecular graphs, or molecular images of drugs. In addition, most methods focus on feature extraction from drug and target alone without fusion learning from drug–target interacting parties, which may lead to insufficient feature representation.MotivationIn order to capture more comprehensive drug features, we utilize both molecular image and chemical features of drugs. The image of the drug mainly has the structural information and spatial features of the drug, while the chemical information includes its functions and properties, which can complement each other, making drug representation more effective and complete. Meanwhile, to enhance the interactive feature learning of drug and target, we introduce a bidirectional multi-head attention mechanism to improve the performance of DTI.ResultsTo enhance feature learning between drugs and targets, we propose a novel model based on deep learning for DTI task called MCL-DTI which uses multimodal information of drug and learn the representation of drug–target interaction for drug–target prediction. In order to further explore a more comprehensive representation of drug features, this paper first exploits two multimodal information of drugs, molecular image and chemical text, to represent the drug. We also introduce to use bi-rectional multi-head corss attention (MCA) method to learn the interrelationships between drugs and targets. Thus, we build two decoders, which include an multi-head self attention (MSA) block and an MCA block, for cross-information learning. We use a decoder for the drug and target separately to obtain the interaction feature maps. Finally, we feed these feature maps generated by decoders into a fusion block for feature extraction and output the prediction results.ConclusionsMCL-DTI achieves the best results in all the three datasets: Human, C. elegans and Davis, including the balanced datasets and an unbalanced dataset. The results on the drug–drug interaction (DDI) task show that MCL-DTI has a strong generalization capability and can be easily applied to other tasks.

  • Research Article
  • 10.33096/ilkom.v17i2.2843.150-161
Sentiment Analysis towards Jokowi Post-Presidential Term Using CNN-BiLSTM with Multi-head Attention on Platform X
  • Aug 19, 2025
  • ILKOM Jurnal Ilmiah
  • Muhammad Rizki Setyawan + 2 more

The development of social media has changed the way the public expresses political opinions, especially regarding the evaluation of President Joko Widodo’s (Jokowi) leadership after his term. Platform X (formerly Twitter) has become the primary source of public opinion data, but the use of informal language and sarcasm makes accurate sentiment analysis challenging. This study creates a sentiment analysis model that uses deep learning with a CNN-BiLSTM structure and a multi-head attention mechanism. The dataset consists of 52,643 tweets that have been labeled and embedded using IndoBERT. To address class imbalance, the SMOTE method was applied to the training data, enabling the model to better learn from minority classes. The results indicate that the model achieves a high accuracy of 98.78%, with an average precision, recall, and F1-score of 0.98. These findings indicate that the model is not only accurate but also reliable in distinguishing each sentiment class. A comparison with other model variants suggests that the complete combination of CNN-BiLSTM and Multi-Head Attention delivers the best performance, although the improvement is relatively small.

  • Research Article
  • 10.3390/s24216813
Dynamic Temporal Denoise Neural Network with Multi-Head Attention for Fault Diagnosis Under Noise Background
  • Oct 23, 2024
  • Sensors (Basel, Switzerland)
  • Zhongzhi Li + 4 more

Fault diagnosis plays a crucial role in maintaining the operational safety of mechanical systems. As intelligent data-driven approaches evolve, deep learning (DL) has emerged as a pivotal technique in fault diagnosis research. However, the collected vibrational signals from mechanical systems are usually corrupted by unrelated noises due to complicated transfer path modulations and component coupling. To solve the above problems, this paper proposed the dynamic temporal denoise neural network with multi-head attention (DTDNet). Firstly, this model transforms one-dimensional signals into two-dimensional tensors based on the periodic self-similarity of signals, employing multi-scale two-dimensional convolution kernels to extract signal features both within and across periods. Secondly, for the problem of lacking denoising structure in traditional convolutional neural networks, a temporal variable denoise (TVD) module with dynamic nonlinear processing is proposed to filter the noises. Lastly, a multi-head attention fusion (MAF) module is used to weight the denoted features of signals with different periods. Evaluation on two datasets, Case Western Reserve University bearing dataset (single sensor) and Real aircraft sensor dataset (multiple sensors), demonstrates that the DTDNet can reduce the useless noises in signals and achieve a remarkable improvement in classification performance compared with the state-of-the-art method. DTDNet provides a high-performance solution for potential noise that may occur in actual fault diagnosis tasks, which has important application value.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.