Multi-Scale Self-Attention for Text Classification

Qipeng Guo,Pengfei Liu,Zheng Zhang,Xiangyang Xue,Xipeng Qiu

doi:10.1609/aaai.v34i05.6290

Multi-Scale Self-Attention for Text Classification

Qipeng Guo, Pengfei Liu + Show 3 more

Open Access

https://doi.org/10.1609/aaai.v34i05.6290

Copy DOI

Journal: Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence	Publication Date: Apr 3, 2020
Citations: 38

Affiliation: Fudan University

#Different Kinds Of Tasks #Moderate Size Datasets + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

In this paper, we introduce the prior knowledge, multi-scale structure, into self-attention modules. We propose a Multi-Scale Transformer which uses multi-scale multi-head self-attention to capture features from different scales. Based on the linguistic perspective and the analysis of pre-trained Transformer (BERT) on a huge corpus, we further design a strategy to control the scale distribution for each layer. Results of three different kinds of tasks (21 datasets) show our Multi-Scale Transformer outperforms the standard Transformer consistently and significantly on small and moderate size datasets.

Full Text