Attention Heads Research Articles

Autonomous vehicles clearly benefit from the expanded Field of View (FoV) of 360° sensors, but modern semantic segmentation approaches rely heavily on annotated training data which is rarely available for <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">panoramic</i> images. We look at this problem from the perspective of domain adaptation and bring <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">panoramic</i> semantic segmentation to a setting, where labelled training data originates from a different distribution of conventional <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">pinhole</i> camera images. To achieve this, we formalize the task of unsupervised domain adaptation for panoramic semantic segmentation and collect DensePass - a novel densely annotated dataset for panoramic segmentation under cross-domain conditions, specifically built to study the Pinhole <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\rightarrow$ </tex-math></inline-formula> PANORAMIC domain shift and accompanied with pinhole camera training examples obtained from Cityscapes. DensePass covers both, labelled- and unlabelled 360° images, with the labelled data comprising 19 classes which explicitly fit the categories available in the source ( <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">i.e.</i> pinhole) domain. Since data-driven models are especially susceptible to changes in data distribution, we introduce P2PDA - a generic framework for Pinhole <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\rightarrow$ </tex-math></inline-formula> Panoramic semantic segmentation which addresses the challenge of domain divergence with different variants of attention-augmented domain adaptation modules, enabling the transfer in output-, feature-, and feature confidence spaces. P2PDA intertwines uncertainty-aware adaptation using confidence values regulated on-the-fly through attention heads with discrepant predictions. Our framework facilitates context exchange when learning domain correspondences and dramatically improves the adaptation performance of accuracy- and efficiency-focused models. Comprehensive experiments verify that our framework clearly surpasses unsupervised domain adaptation- and specialized panoramic segmentation approaches as well as state-of-the-art semantic segmentation methods.

Read full abstract

Transformer-based models have gained significant advances in neural machine translation (NMT). The main component of the transformer is the multihead attention layer. In theory, more heads enhance the expressive power of the NMT model. But this is not always the case in practice. On the one hand, the computations of each head attention are conducted in the same subspace, without considering the different subspaces of all the tokens. On the other hand, the low-rank bottleneck may occur, when the number of heads surpasses a threshold. To address the low-rank bottleneck, the two mainstream methods make the head size equal to the sequence length and complicate the distribution of self-attention heads. However, these methods are challenged by the variable sequence length in the corpus and the sheer number of parameters to be learned. Therefore, this paper proposes the interacting-head attention mechanism, which induces deeper and wider interactions across the attention heads by low-dimension computations in different subspaces of all the tokens, and chooses the appropriate number of heads to avoid low-rank bottleneck. The proposed model was tested on machine translation tasks of IWSLT2016 DE-EN, WMT17 EN-DE, and WMT17 EN-CS. Compared to the original multihead attention, our model improved the performance by 2.78 BLEU/0.85 WER/2.90 METEOR/2.65 ROUGE_L/0.29 CIDEr/2.97 YiSi and 2.43 BLEU/1.38 WER/3.05 METEOR/2.70 ROUGE_L/0.30 CIDEr/3.59 YiSi on the evaluation set and the test set, respectively, for IWSLT2016 DE-EN, 2.31 BLEU/5.94 WER/1.46 METEOR/1.35 ROUGE_L/0.07 CIDEr/0.33 YiSi and 1.62 BLEU/6.04 WER/1.39 METEOR/0.11 CIDEr/0.87 YiSi on the evaluation set and newstest2014, respectively, for WMT17 EN-DE, and 3.87 BLEU/3.05 WER/9.22 METEOR/3.81 ROUGE_L/0.36 CIDEr/4.14 YiSi and 4.62 BLEU/2.41 WER/9.82 METEOR/4.82 ROUGE_L/0.44 CIDEr/5.25 YiSi on the evaluation set and newstest2014, respectively, for WMT17 EN-CS.

Read full abstract

Attention Heads Research Articles

Related Topics

Articles published on Attention Heads

ATICVis: A Visual Analytics System for Asymmetric Transformer Models Interpretation and Comparison

Bayesian Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification

ACLMHA and FML: A brain-inspired kinship verification framework.

FormerLeaf: An efficient vision transformer for Cassava Leaf Disease detection

MASPC_Transform: A Plant Point Cloud Segmentation Network Based on Multi-Head Attention Separation and Position Code

MAENet: A novel multi-head association attention enhancement network for completing intra-modal interaction in image captioning

Learning From Demonstrations Via Multi-Level and Multi-Attention Domain-Adaptive Meta-Learning

Muformer: A long sequence time-series forecasting model based on modified multi-head attention

Transfer Beyond the Field of View: Dense Panoramic Semantic Segmentation via Unsupervised Domain Adaptation

Transformer Uncertainty Estimation with Hierarchical Stochastic Attention

Towards Building ASR Systems for the Next Billion Users

Protein sequence profile prediction using ProtAlbert transformer

An Improved Transformer-Based Neural Machine Translation Strategy: Interacting-Head Attention.

Protein language-model embeddings for fast, accurate, and alignment-free protein structure prediction

MEDUSA: Multi-Scale Encoder-Decoder Self-Attention Deep Neural Network Architecture for Medical Image Analysis.

Mixhead: Breaking the low-rank bottleneck in multi-head attention language models

Interpretable Feature Engineering for Time Series Predictors using Attention Networks

Distributed Multi-Attention Generative Adversarial Network for Surrounding Vehicles Trajectories Prediction Based On Comprehensive Social Repulsion

Multilevel Deformable Attention-Aggregated Networks for Change Detection in Bitemporal Remote Sensing Imagery

CoCoPRED: coiled-coil protein structural feature prediction from amino acid sequence using deep neural networks.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Attention Heads Research Articles

Related Topics

Articles published on Attention Heads

ATICVis: A Visual Analytics System for Asymmetric Transformer Models Interpretation and Comparison

Bayesian Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification

ACLMHA and FML: A brain-inspired kinship verification framework.

FormerLeaf: An efficient vision transformer for Cassava Leaf Disease detection

MASPC_Transform: A Plant Point Cloud Segmentation Network Based on Multi-Head Attention Separation and Position Code

MAENet: A novel multi-head association attention enhancement network for completing intra-modal interaction in image captioning

Learning From Demonstrations Via Multi-Level and Multi-Attention Domain-Adaptive Meta-Learning

Muformer: A long sequence time-series forecasting model based on modified multi-head attention

Transfer Beyond the Field of View: Dense Panoramic Semantic Segmentation via Unsupervised Domain Adaptation

Transformer Uncertainty Estimation with Hierarchical Stochastic Attention

Towards Building ASR Systems for the Next Billion Users

Protein sequence profile prediction using ProtAlbert transformer

An Improved Transformer-Based Neural Machine Translation Strategy: Interacting-Head Attention.

Protein language-model embeddings for fast, accurate, and alignment-free protein structure prediction

MEDUSA: Multi-Scale Encoder-Decoder Self-Attention Deep Neural Network Architecture for Medical Image Analysis.

Mixhead: Breaking the low-rank bottleneck in multi-head attention language models

Interpretable Feature Engineering for Time Series Predictors using Attention Networks

Distributed Multi-Attention Generative Adversarial Network for Surrounding Vehicles Trajectories Prediction Based On Comprehensive Social Repulsion

Multilevel Deformable Attention-Aggregated Networks for Change Detection in Bitemporal Remote Sensing Imagery

CoCoPRED: coiled-coil protein structural feature prediction from amino acid sequence using deep neural networks.