Related Topics
Articles published on Self-attention Mechanism
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
5029 Search results
Sort by Recency
- New
- Research Article
- 10.3390/s26031055
- Feb 6, 2026
- Sensors
- Xinzhao Li + 3 more
In recent years, the methods based on convolutional neural networks have achieved significant progress in hyperspectral image super-resolution. However, existing methods still face two key challenges: (1) they fail to fully extract edge detail information from hyperspectral images; (2) they struggle to simultaneously capture local and global features. To address these issues, we propose an Edge-Distilled and Local–Global Feature Selection network (EDLGFS) for hyperspectral image super-resolution. This network aims to effectively leverage edge details and local–global features, thereby enhancing super-resolution reconstruction quality. Firstly, we design an edge-guided super-resolution network based on knowledge distillation. This network transfers edge knowledge to improve the reconstruction. Secondly, we propose a Local–Global Feature Selection mechanism (LGFS), which integrates convolutions of different sizes with the self-attention mechanism. This design models spatial correlations across features with different receptive fields, achieving efficient feature selection to more effectively capture local and global features. Finally, we propose a dynamic loss mechanism to more effectively balance the contribution of each loss term. Extensive experimental results on three public datasets demonstrate that the proposed EDLGFS achieves superior super-resolution reconstruction quality.
- New
- Research Article
- 10.1038/s41598-026-37052-6
- Feb 5, 2026
- Scientific reports
- Zeran Wang + 5 more
Object detection, a cornerstone of computer vision, aims to localize and classify objects within images. This comprehensive survey reviews modern object detection methods, focusing on two dominant paradigms: Convolutional Neural Networks (CNNs) and Transformer-based architectures. This work provides a structured comparison of CNN-based and Transformer-based detection paradigms, highlighting their complementary strengths and trade-offs. CNNs demonstrate advantages in local feature extraction and computational efficiency, whereas Transformers excel at capturing global context through self-attention mechanisms. We also analyze multi-modal fusion techniques integrating Red-Green-Blue (RGB), Light Detection and Ranging (LiDAR), and language embeddings. Benchmark results from representative models include: Real-Time Detection Transformer (RT-DETR) achieves 53.1% mean Average Precision (mAP) at Intersection over Union (IoU) at 0.5:0.95, You Only Look Once version 8 (YOLOv8) achieves 50.2% mAP at 0.5:0.95, real-time detectors exceed 100 frames per second (FPS) with competitive accuracy, and specialized infrared methods achieve 92.45% F-measure on NUAA-SIRST dataset. The work introduces a novel taxonomy of multi-modal fusion strategies, documents field-wide and review-specific limitations, and synthesizes recent 2024 to 2025 benchmarks across diverse datasets. Despite these advances, significant challenges remain in handling scale variation, occlusion effects, and domain adaptation. This survey outlines these persistent obstacles and promising research directions, providing a structured reference for researchers and practitioners.
- New
- Research Article
- 10.1109/tnnls.2026.3656591
- Feb 4, 2026
- IEEE transactions on neural networks and learning systems
- Xiangrui Zhang + 5 more
To improve the reliability and interpretability of industrial process monitoring, this article proposes a causal graph spatial-temporal autoencoder (CGSTAE). The network architecture of CGSTAE combines two components: a correlation graph structure learning module based on spatial self-attention mechanism (SSAM) and a spatial-temporal encoder-decoder module utilizing graph convolutional long short-term memory (GCLSTM). The SSAM learns correlation graphs by capturing dynamic relationships between variables, while a novel three-step causal graph structure learning algorithm is introduced to derive a causal graph from these correlation graphs. The algorithm leverages a reverse perspective of causal invariance principle to uncover the invariant causal graph from varying correlations. The spatial-temporal encoder-decoder, built with GCLSTM units, reconstructs time series process data within a sequence-to-sequence framework. The proposed CGSTAE enables effective process monitoring and fault detection through two statistics in the feature space and residual space. Finally, we validate the effectiveness of CGSTAE in process monitoring through the Tennessee Eastman process (TEP) and a real-world air separation process (ASP).
- New
- Research Article
- 10.1088/1361-6501/ae41d9
- Feb 4, 2026
- Measurement Science and Technology
- Jingwen Wei + 1 more
Abstract In the industrial field, the reliability of rotating machinery has a key impact on production safety and operation efficiency. Current fault prediction and health management methods usually rely on task-specific models, which face significant challenges when dealing with signal characteristics with different operating conditions. The existing lightweight network models have weak anti-noise ability in noisy environments and lack of cross-condition diagnosis ability and generalization. Inspired by the Transformer-CNN(convolution neural network) collaboration model, this study introduces a novel fault diagnosis model named SMAConvFormer to tackle the aforementioned challenges. First, a multi-scale channel attention embedded separable convolution is proposed to dynamically enhance the feature responses of key channels via channel attention, suppress noise-induced redundancy, and accurately capture multi-scale local receptive field features that represent the early operational stage of mechanical equipment under noisy conditions. Second, a synergistic multi-dimensional self-attention mechanism is proposed, which incorporates broadcast self-attention to model global temporal correlations, multi-scale spatial attention to capture local high-frequency impulsive features, and progressive channel self-attention to optimize channel weight allocation thereby enabling the collaborative extraction of correlation features across different frequency bands. Noise interference is effectively suppressed, and the model's diagnostic adaptability under varying operating conditions is significantly enhanced. Final, experiments show that SMAConvFormer outperforms recent fault diagnosis methods in terms of diagnostic performance and generalization ability; in addition, the effectiveness of the proposed modules are also verified.
- New
- Research Article
- 10.31449/inf.v50i5.12777
- Feb 2, 2026
- Informatica
- Mengzhu Yu
Autism Spectrum Disorder (ASD) diagnosis remains challenging because of its heterogeneity and reliance on subjective behavioral assessments. Resting-state functional MRI (fMRI) presents a compelling opportunity avenue for identifying objective biomarkers, but decoding its complex spatiotemporal patterns requires advanced computational models. While Deep Learning (DL) approaches have progressed, many struggle to concurrently capture local neural dynamics and global temporal dependencies. A novel end-to-end CNN-Transformer hybrid framework designed for fMRI-based autism diagnosis is proposed to address this. Our model leverages a convolutional module to extract localized spatiotemporal features, which are then processed by a Transformer encoder to model long-range, global dependencies through a Multi-Head Self-Attention (MHSA) mechanism. Evaluated on the large multi-site ABIDE-I dataset (N=1,035), the suggested model achieved state-of-the-art performance with an accuracy of 77.85%, a sensitivity of 76.52%, a specificity of 78.90%, and an F1-score of 77.71%. Ablation studies confirmed the critical contribution of each architectural component, and comparisons with pre-trained CNNs and other leading methods demonstrated superior and statistically significant performance (p<0.05). Despite an observed performance drop in site-specific evaluations, underscoring the challenge of scanner heterogeneity, our results affirm that the synergistic integration of local feature learning and global contextual modeling is a powerful paradigm for neuroimaging-based diagnostic applications.
- New
- Research Article
- 10.1016/j.neunet.2025.108152
- Feb 1, 2026
- Neural networks : the official journal of the International Neural Network Society
- Mingzhu Tai + 3 more
Spatial-spectral multi-order gated aggregation network with bidirectional interactive fusion for hyperspectral image classification.
- New
- Research Article
- 10.1016/j.compbiolchem.2025.108713
- Feb 1, 2026
- Computational biology and chemistry
- Ibrahim Aruk + 2 more
A comprehensive comparison of convolutional neural network and visual transformer models on skin cancer classification.
- New
- Research Article
- 10.1016/j.compbiolchem.2025.108621
- Feb 1, 2026
- Computational biology and chemistry
- Shraddha Jain + 2 more
Lightweight self-attention and deep gated neural network (LSA-DGNet) for multiple neurological disease detection.
- New
- Research Article
- 10.1016/j.compbiolchem.2025.108693
- Feb 1, 2026
- Computational biology and chemistry
- Pei Xu + 4 more
Prediction of protein thermostability trends based on the self-attention mechanism driven sparse convolutional network.
- New
- Research Article
- 10.1016/j.hcl.2025.08.007
- Feb 1, 2026
- Hand clinics
- Ching-Heng Lin + 1 more
A Review of Transformers in Medical Research and Health Care.
- New
- Research Article
- 10.1016/j.jfca.2026.108872
- Feb 1, 2026
- Journal of Food Composition and Analysis
- Meiyuan Chen + 14 more
Advanced black tea fermentation grade identification: Terahertz time-domain spectroscopy coupled with self-attention mechanism guided deep learning
- New
- Research Article
- 10.1016/j.future.2026.108421
- Feb 1, 2026
- Future Generation Computer Systems
- Saihua Cai + 5 more
MD-CGM: Malicious Traffic Detection Model Based on CycleGAN and Multi-Head Self-Attention Mechanism
- New
- Research Article
- 10.1016/j.asoc.2025.114404
- Feb 1, 2026
- Applied Soft Computing
- Bian Xu + 3 more
Real-time location technology of hydrogen energy storage and transportation leakage based on self-attention mechanism group
- New
- Research Article
- 10.1016/j.neunet.2025.108140
- Feb 1, 2026
- Neural networks : the official journal of the International Neural Network Society
- Chunyi Hou + 5 more
Graph-patchformer: Patch interaction transformer with adaptive graph learning for multivariate time series forecasting.
- New
- Research Article
- 10.1088/1741-2552/ae3d68
- Feb 1, 2026
- Journal of Neural Engineering
- Junhao Jia + 4 more
Theoretical and applied research on spatio-temporal graph attention networks for single-trial P300 detection
- New
- Research Article
- 10.3847/1538-3881/ae30e5
- Jan 28, 2026
- The Astronomical Journal
- Jun Zhang + 2 more
Abstract The dielectric permittivity is a crucial parameter in planetary ground-penetrating radar (GPR) missions, such as the RIMFAX radar onboard the Mars 2020 Perseverance rover. It characterizes subsurface materials and enables depth interpretation of radargrams. In this study, we develop a deep learning–based approach for inverting dielectric permittivity from Radar Imager for Mars’ Subsurface Experiment (RIMFAX) GPR data. The architecture integrates a convolutional neural network, Bi-LSTM, and a self-attention mechanism, providing a principled framework for leveraging the sequential nature of GPR echoes, capturing long-range subsurface dependencies, and enhancing both the robustness and accuracy of inversions. The input is 1D processed GPR data, and the output is the corresponding 1D dielectric permittivity profile. By combining multiple 1D dielectric permittivity profiles, complex 2D profiles can be constructed. A large volume of synthetic data is used to train the model, allowing it to directly capture the intrinsic relationship between GPR data and dielectric permittivity. The approach is validated on the test set and then applied to the RIMFAX GPR data acquired by the Perseverance rover on Sols 389 and 770. The prediction results effectively reveal key characteristics of the subsurface sedimentary structure, including the number of layers, thicknesses, and the geometry of their contacts. It is broadly consistent with findings reported in prior research, demonstrating the great potential and promising applicability of the approach for dielectric permittivity inversion. However, in such complex planetary radar data, the true dielectric permittivity remains uncertain, and caution is therefore required when using permittivity estimates to infer subsurface structure.
- New
- Research Article
- 10.1038/s41598-026-36456-8
- Jan 28, 2026
- Scientific reports
- Yuying Zhang + 8 more
To address the semantic gap in physical sensor data for fault diagnosis of heavy-duty railway maintenance machinery and the underuse of semantic information in maintenance logs, this study proposes a model that treats fault-related text as a virtual semantic sensor. The goal is to explore a semantic-aware approach to fault diagnosis and its role in multisensor fusion. A classification model combining a BERT pretrained model with a convolutional neural network (BERT-CNN) was built. To improve the focus on key semantic units and strengthen links between textual features and sensor modalities, a dual self-attention (DSA) mechanism was added, forming the BERT-DSA-CNN model. It extracts structured semantic feature vectors from unstructured logs, which serve as outputs of the virtual semantic sensor. Experiments show that (1) incorporating DSA significantly increases performance, with BERT-DSA-CNN and Word2vec-DSA-CNN outperforming baselines (BERT-CNN and Word2vec-CNN) in terms of accuracy, precision, recall, and F1-score; (2) BERT's contextual embeddings clearly surpass Word2vec, as BERT-DSA-CNN consistently outperforms Word2vec-DSA-CNN; (3) CNN effectively captures local features of short fault texts, as BERT-CNN outperforms BERT-BiLSTM on most metrics; and (4) deep semantic feature learning substantially outperforms traditional machine learning, confirming the superiority of deep semantic feature learning. This study validates that the proposed semantic-aware model can efficiently transform fault texts into semantic features for identification. More importantly, the structured semantic features extracted by this model have the potential to be fused with physical sensor data in future work, which could provide a foundation for more accurate, robust, and interpretable intelligent fault diagnosis systems for heavy-duty railway maintenance machinery.
- New
- Research Article
- 10.3390/app16031285
- Jan 27, 2026
- Applied Sciences
- Junnan Feng + 4 more
The cutterhead torque of a full-face tunnel boring machine (TBM) is a pivotal parameter that characterises the rock-machine interaction. Its dynamic prediction is of considerable significance to achieve intelligent regulation of the boring parameters and enhance the construction efficiency and safety. In order to achieve high-precision time series prediction of cutterhead torque under complex geological conditions, this study proposes an intelligent prediction method (VBGAP) that integrates signal decomposition mechanism and physical constraints. At the data preprocessing level, a multi-step data cleaning process is designed. This process comprises the following steps: the processing of invalid values, the detection of outliers, and normalisation. The non-smooth torque time-series signal is decomposed by variational mode decomposition (VMD) into narrow-band sub-signals that serve as a data-driven, frequency-specific input for subsequent modelling, and a hybrid deep learning model based on Bi-GRU and self-attention mechanism is built for each sub-signal. Finally, the prediction results of each component are linearly superimposed to achieve signal reconstruction. Concurrently, a novel modal energy conservation loss function is proposed, with the objective of effectively constraining the information entropy decay in the decomposition-reconstruction process. The validity of the proposed method is supported by empirical evidence from a real tunnel project dataset in Northeast China, which demonstrates an average accuracy of over 90% in a multi-step prediction task with a time step of 30 s. This suggests that the proposed method exhibits superior adaptability and prediction accuracy in comparison to existing mainstream deep learning models. The findings of the research provide novel concepts and methodologies for the intelligent regulation of TBM boring parameters.
- New
- Research Article
- 10.7717/peerj-cs.3515
- Jan 27, 2026
- PeerJ Computer Science
- Bing Shi + 4 more
Aquaculture water quality parameters are influenced by multiple factors, exhibiting significant temporal and spatial variations. Current prediction methods for these parameters primarily focus on time series predictions at specific observation points, which do not comprehensively characterize the spatiotemporal distribution dynamics of pond water quality parameters. To address this limitation, this study proposes a novel model which incorporates a Self-Attention (SA) mechanism to enhance the capture of long-term dependencies within the data. Furthermore, an enhanced Sparrow Search Algorithm (ESSA) is implemented to optimize the hyperparameters of the long short-term memory (LSTM) network, thereby improving the time series prediction of water quality parameters. Building upon these predictions, the Radial Basis Function (RBF) algorithm is utilized for spatial prediction. The proposed spatiotemporal prediction model, which combines ESSA-SA-LSTM and RBF, demonstrates superior performance by reducing the mean square error (MSE), root mean square error (RMSE), and mean absolute error (MAE) of dissolved oxygen (DO) and water temperature spatiotemporal predictions, outperforming existing comparative algorithms. The model presented in this study significantly enhances the accuracy of spatiotemporal predictions for water quality parameters, playing a crucial role in ensuring the safe production and management of aquatic environments in aquaculture.
- New
- Research Article
- 10.1088/1361-6501/ae3e0c
- Jan 27, 2026
- Measurement Science and Technology
- Zhimin Qiu + 4 more
Abstract This paper addresses the challenge of insufficient multi-scale feature representation and limited local geometric modeling in sparse point cloud 3D object detection. We propose an efficient sparse voxel detection network, SAVF-Net (Sparse Attention and Voxel Fusion Network). Building on a fully sparse detection framework, SAVF-Net introduces a Neighborhood Sparse Attention (NSA) module and a parallel dual-tower multi-scale voxel fusion module (MSVF-SSM). The NSA module employs a local self-attention mechanism on sparse neighborhood indices, effectively capturing geometric correlations between sparse voxels and enhancing representations of small or boundary objects. The MSVF-SSM module uses parallel sparse convolutions with different dilation rates and a channel-spatial dualattention mechanism to adaptively fuse global semantic and local geometric features, thereby improving the network's perception of objects across different scales. Experimental results on the KITTI dataset demonstrate that SAVF-Net significantly outperforms mainstream methods. Specifically, SAVF-Net improves the mean average precision of 3D detection and BEV detection by approximately 2.20% and 2.14%, respectively. Notably, it achieves a 2%-4% accuracy gain on small-object categories such as pedestrians and cyclists. Furthermore, field tests on real-world roads validate the model's generalization and stability in complex dynamic scenes. Overall, SAVF-Net achieves improved accuracy and robustness for sparse point cloud 3D detection while preserving its sparse and efficient computation, providing strong support for efficient sensor perception in autonomous driving systems.