Multi-head Self-attention Research Articles

Transformers have shown remarkable performance, however, their architecture design is a time-consuming process that demands expertise and trial-and-error. Thus, it is worthwhile to investigate efficient methods for automatically searching high-performance Transformers via Transformer Architecture Search (TAS). In order to improve the search efficiency, training-free proxy based methods have been widely adopted in Neural Architecture Search (NAS). Whereas, these proxies have been found to be inadequate in generalizing well to Transformer search spaces, as confirmed by several studies and our own experiments. This paper presents an effective scheme for TAS called TRansformer Architecture search with ZerO-cost pRoxy guided evolution (T-Razor) that achieves exceptional efficiency. First, through theoretical analysis, we discover that the synaptic diversity of multi-head self-attention (MSA) and the saliency of multi-layer perceptron (MLP) are correlated with the performance of corresponding Transformers. The properties of synaptic diversity and synaptic saliency motivate us to introduce the ranks of synaptic diversity and saliency that denoted as DSS++ for evaluating and ranking Transformers. DSS++ incorporates correlation information among sampled Transformers to provide unified scores for both synaptic diversity and synaptic saliency. We then propose a block-wise evolution search guided by DSS++ to find optimal Transformers. DSS++ determines the positions for mutation and crossover, enhancing the exploration ability. Experimental results demonstrate that our T-Razor performs competitively against the state-of-the-art manually or automatically designed Transformer architectures across four popular Transformer search spaces. Significantly, T-Razor improves the searching efficiency across different Transformer search spaces, e.g., reducing required GPU days from more than 24 to less than 0.4 and outperforming existing zero-cost approaches. We also apply T-Razor to the BERT search space and find that the searched Transformers achieve competitive GLUE results on several Neural Language Processing (NLP) datasets. This work provides insights into training-free TAS, revealing the usefulness of evaluating Transformers based on the properties of their different blocks.

PurposeDeformable image registration (DIR) is crucial for improving the precision of clinical diagnosis. Recent Transformer-based DIR methods have shown promising performance by capturing long-range dependencies. Nevertheless, these methods still grapple with high computational complexity. This work aims to enhance the performance of DIR in both computational efficiency and registration accuracy. MethodsWe proposed a weakly-supervised lightweight Transformer model, named SparseMorph. To reduce computational complexity without compromising the representative feature capture ability, we designed a sparse multi-head self-attention (SMHA) mechanism. To accumulate representative features while preserving high computational efficiency, we constructed a multi-branch multi-layer perception (MMLP) module. Additionally, we developed an anatomically-constrained weakly-supervised strategy to guide the alignment of regions-of-interest in mono- and multi-modal images. ResultsWe assessed SparseMorph in terms of registration accuracy and computational complexity.Within the mono-modal brain datasets IXI and OASIS, our SparseMorph outperforms the state-of-the-art method TransMatch with improvements of 3.2 % and 2.9 % in DSC scores for MRI-to-CT registration tasks, respectively. Moreover, in the multi-modal cardiac dataset MMWHS, our SparseMorph shows DSC score improvements of 9.7 % and 11.4 % compared to TransMatch in MRI-to-CT and CT-to-MRI registration tasks, respectively. Notably, SparseMorph attains these performance advantages while utilizing 33.33 % of the parameters of TransMatch. ConclusionsThe proposed weakly-supervised deformable image registration model, SparseMorph, demonstrates efficiency in both mono- and multi-modal registration tasks, exhibiting superior performance compared to state-of-the-art algorithms, and establishing an effective DIR method for clinical applications.

Multi-head Self-attention Research Articles

Related Topics

Articles published on Multi-head Self-attention

Personalized multi-head self-attention network for news recommendation

Training-Free Transformer Architecture Search With Zero-Cost Proxy Guided Evolution.

Provenance-based APT campaigns detection via masked graph representation learning

ProTformer: Transformer-based model for superior prediction of protein content in lablab bean (Lablab purpureus L.) using Near-Infrared Reflectance spectroscopy

A high-resolution method for direction of arrival estimation based on an improved self-attention module.

DBCvT: Double Branch Convolutional Transformer for Medical Image Classification

A2HTL: An Automated Hybrid Transformer-Based Learning for Predicting Survival of Esophageal Cancer Using CT Images.

Tumor detection in breast cancer pathology patches using a Multi-scale Multi-head Self-attention Ensemble Network on Whole Slide Images

MSCL-Attention: A Multi-Scale Convolutional Long Short-Term Memory (LSTM) Attention Network for Predicting CO2 Emissions from Vehicles

Multi-Modal Fusion Network with Multi-Head Self-Attention for Injection Training Evaluation in Medical Education

SparseMorph: A weakly-supervised lightweight sparse transformer for mono- and multi-modal deformable image registration

SMLS-YOLO: an extremely lightweight pathological myopia instance segmentation method.

An effective CNN-MHSA method for the fault diagnosis of ZPW-2000A track circuit

Autonomous collision avoidance decision-making method for USV based on ATL-TD3 algorithm

Advancing semantic segmentation of two-dimensional materials using a semantic-adaptive transformer model

Classification of hyperspectral and LiDAR data by transformer-based enhancement

Windformer: A novel 4D high-resolution system for multi-step wind speed vector forecasting based on temporal shifted window multi-head self-attention

Sensitivity Matrix Update Method for Electrical Resistance Tomography Based on Error-Constrained Cross Fusion Residual Attention Network

Residual SwinV2 transformer coordinate attention network for image super resolution

Modeling of joint extraction of entity relationships in clinical electronic medical records

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Multi-head Self-attention Research Articles

Related Topics

Articles published on Multi-head Self-attention

Personalized multi-head self-attention network for news recommendation

Training-Free Transformer Architecture Search With Zero-Cost Proxy Guided Evolution.

Provenance-based APT campaigns detection via masked graph representation learning

ProTformer: Transformer-based model for superior prediction of protein content in lablab bean (Lablab purpureus L.) using Near-Infrared Reflectance spectroscopy

A high-resolution method for direction of arrival estimation based on an improved self-attention module.

DBCvT: Double Branch Convolutional Transformer for Medical Image Classification

A2HTL: An Automated Hybrid Transformer-Based Learning for Predicting Survival of Esophageal Cancer Using CT Images.

Tumor detection in breast cancer pathology patches using a Multi-scale Multi-head Self-attention Ensemble Network on Whole Slide Images

MSCL-Attention: A Multi-Scale Convolutional Long Short-Term Memory (LSTM) Attention Network for Predicting CO2 Emissions from Vehicles

Multi-Modal Fusion Network with Multi-Head Self-Attention for Injection Training Evaluation in Medical Education

SparseMorph: A weakly-supervised lightweight sparse transformer for mono- and multi-modal deformable image registration

SMLS-YOLO: an extremely lightweight pathological myopia instance segmentation method.

An effective CNN-MHSA method for the fault diagnosis of ZPW-2000A track circuit

Autonomous collision avoidance decision-making method for USV based on ATL-TD3 algorithm

Advancing semantic segmentation of two-dimensional materials using a semantic-adaptive transformer model

Classification of hyperspectral and LiDAR data by transformer-based enhancement

Windformer: A novel 4D high-resolution system for multi-step wind speed vector forecasting based on temporal shifted window multi-head self-attention

Sensitivity Matrix Update Method for Electrical Resistance Tomography Based on Error-Constrained Cross Fusion Residual Attention Network

Residual SwinV2 transformer coordinate attention network for image super resolution

Modeling of joint extraction of entity relationships in clinical electronic medical records