Cascaded Semantic and Positional Self-Attention Network for Document Classification

Juyong Jiang,Jie Zhang,Kai Zhang

doi:10.18653/v1/2020.findings-emnlp.59

Abstract

Transformers have shown great success in learning representations for language modelling. However, an open challenge still remains on how to systematically aggregate semantic information (word embedding) with positional (or temporal) information (word orders). In this work, we propose a new architecture to aggregate the two sources of information using cascaded semantic and positional self-attention network (CSPAN) in the context of document classification. The CSPAN uses a semantic self-attention layer cascaded with Bi-LSTM to process the semantic and positional information in a sequential manner, and then adaptively combine them together through a residue connection. Compared with commonly used positional encoding schemes, CSPAN can exploit the interaction between semantics and word positions in a more interpretable and adaptive manner, and the classification performance can be notably improved while simultaneously preserving a compact model size and high convergence rate. We evaluate the CSPAN model on several benchmark data sets for document classification with careful ablation studies, and demonstrate the encouraging results compared with state of the art.

Highlights

Document classification is one of the fundamental problems in natural language processing, which is aimed at assigning one or multiple labels to a short text paragraph
Recurrent neural networks (RNN) are highly effective models for exploiting word orders in learning useful representations, thanks to the iterative update of the hidden states that depend on both the semantics of the current word and that of historical words, and the long-range dependency made possible through LSTMs (Yang et al, 2016; Stephen et al, 2018; Adhikari et al, 2019)
In order to solve these challenges with positional encoding, we explore a new architecture in combining the semantic and temporal information in document classification, called “cascaded semantic and positional self-attention network” (CSPAN)

Summary

Introduction

Document classification is one of the fundamental problems in natural language processing, which is aimed at assigning one or multiple labels to a (typically) short text paragraph. CNNs have gained huge success in image procesing and classification and were recently introduced to NLP domains like document classification (Zhang et al, 2015; Lei et al.,2015; Conneau et al, 2016; Kim and Yang, 2018; Kim, 2014).The local convolutional operator is sensitive to word orders but only partially and limited by the size of the kernel, and so long-term relations may need many layers and be challenging. Transformers, different from both, fully exploit the modelling power of self-attention mechanism (Shen et al, 2018; Gao et al, 2018; Zheng et al, 2018) and have significantly improved state of the art in many NLP tasks such as machine translation (Vaswani et al, 2017), Findings of the Association for Computational Linguistics: EMNLP 2020, pages 669–677 November 16 - 20, 2020. Transformers, different from both, fully exploit the modelling power of self-attention mechanism (Shen et al, 2018; Gao et al, 2018; Zheng et al, 2018) and have significantly improved state of the art in many NLP tasks such as machine translation (Vaswani et al, 2017), Findings of the Association for Computational Linguistics: EMNLP 2020, pages 669–677 November 16 - 20, 2020. c 2020 Association for Computational Linguistics language understanding (Devlin et al, 2018) and language modeling (Dai et al, 2019), etc

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Cascaded Semantic and Positional Self-Attention Network for Document Classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2020
Citations: 3	License type: cc-by

Similar Papers

Word Embeddings for Natural Language Processing

-

01 Jan 2015
01 Jan 2015

Word Embedding Representation with Synthetic Position and Context Information for Relation Extraction
Yunzhou Shi ... Yujiu Yang
-
Yunzhou Shi, et. al.Yunzhou Shi ... Yujiu Yang
01 Nov 2018
01 Nov 2018

An improved alignment-free model for DNA sequence similarity metric.
Junpeng Bao ... Ruiyu Yuan
BMC Bioinformatics | VOL. 15
Junpeng Bao, et. al.Junpeng Bao ... Ruiyu Yuan
28 Sep 2014
BMC Bioinformatics | VOL. 15

Chinese Named Entity Recognition based on Transformer Encoder and BiLSTM
...
-
, et. al. ...
30 Nov 2020
30 Nov 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Cascaded Semantic and Positional Self-Attention Network for Document Classification

Abstract

Highlights

Summary

Talk to us

Similar Papers