Improving BERT with Syntax-aware Local Attention

Zhongli Li,Chao Li,Yunbo Cao,Qingyu Zhou,Ke Xu

doi:10.18653/v1/2021.findings-acl.57

Abstract

Pre-trained Transformer-based neural language models, such as BERT, have achieved remarkable results on varieties of NLP tasks. Recent works have shown that attention-based models can benefit from more focused attention over local regions. Most of them restrict the attention scope within a linear span, or confine to certain tasks such as machine translation and question answering. In this paper, we propose a syntax-aware local attention, where the attention scopes are restrained based on the distances in the syntactic structure. The proposed syntax-aware local attention can be integrated with pretrained language models, such as BERT, to render the model to focus on syntactically relevant words. We conduct experiments on various single-sentence benchmarks, including sentence classification and sequence labeling tasks. Experimental results show consistent gains over BERT on all benchmark datasets. The extensive studies verify that our model achieves better performance owing to more focused attention over syntactically relevant words.

Highlights

Transformer (Vaswani et al, 2017) has performed remarkably well, standing on the multiheaded dot-product attention which fully takes into account the global contextualized information
Several studies find that self-attention can be enhanced by local attention, where the attention scopes are restricted to important local regions
We propose a syntax-aware local attention (SLA) which is adaptable to several tasks, and integrate it with BERT (Devlin et al, 2019)

Summary

Introduction

Transformer (Vaswani et al, 2017) has performed remarkably well, standing on the multiheaded dot-product attention which fully takes into account the global contextualized information. We propose a syntax-aware local attention (SLA) which is adaptable to several tasks, and integrate it with BERT (Devlin et al, 2019). We first apply dependency parsing to the input text, and calculate the distances of input words to construct the self-attention masks. The local attention scores are calculated by applying these masks to the dot-product attention. We incorporate the syntax-aware local attention with the Transformer global attention. We find that the syntax-aware local attention is more involved in the aggregation of local and global attention. The attention visualization validates the syntactic information supports to capture important local regions. This paper makes the following contributions: i) SLA can capture the information of important local regions on the syntactic structure.

Transformer Attention

Local Attentions

Approach

Syntax-aware Local Attention

Attention Aggregation

Experimental Setup

Main Results

Conclusion

Training Procedure

Implementation Details

Testing on Chinese Benchmarks

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Improving BERT with Syntax-aware Local Attention

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2021
Citations: 7	License type: cc-by

Similar Papers

A Study of Vietnamese Sentiment Classification with Ensemble Pre-Trained Language Models
Dang Van Thin ... Duong Ngoc Hao
Vietnam Journal of Computer Science | VOL. 11
Dang Van Thin, et. al.Dang Van Thin ... Duong Ngoc Hao
07 Dec 2023
Vietnam Journal of Computer Science | VOL. 11

Neural Transfer Learning For Vietnamese Sentiment Analysis Using Pre-trained Contextual Language Models
An Pha Le ... Tran Vu Pham
-
An Pha Le, et. al.An Pha Le ... Tran Vu Pham
16 Dec 2021
16 Dec 2021

On the Power of Pre-Trained Text Representations
Yu Meng ... Jiawei Han
-
Yu Meng, et. al.Yu Meng ... Jiawei Han
14 Aug 2021
14 Aug 2021

Deep entity matching with pre-trained language models
Yuliang Li ... Jinfeng Li
Proceedings of the VLDB Endowment | VOL. 14
Yuliang Li, et. al.Yuliang Li ... Jinfeng Li
01 Sep 2020
Proceedings of the VLDB Endowment | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improving BERT with Syntax-aware Local Attention

Abstract

Highlights

Summary

Talk to us

Similar Papers