Abstract

Transformers have shown great success in learning representations for language modelling. However, an open challenge still remains on how to systematically aggregate semantic information (word embedding) with positional (or temporal) information (word orders). In this work, we propose a new architecture to aggregate the two sources of information using cascaded semantic and positional self-attention network (CSPAN) in the context of document classification. The CSPAN uses a semantic self-attention layer cascaded with Bi-LSTM to process the semantic and positional information in a sequential manner, and then adaptively combine them together through a residue connection. Compared with commonly used positional encoding schemes, CSPAN can exploit the interaction between semantics and word positions in a more interpretable and adaptive manner, and the classification performance can be notably improved while simultaneously preserving a compact model size and high convergence rate. We evaluate the CSPAN model on several benchmark data sets for document classification with careful ablation studies, and demonstrate the encouraging results compared with state of the art.

Highlights

  • Document classification is one of the fundamental problems in natural language processing, which is aimed at assigning one or multiple labels to a short text paragraph

  • Recurrent neural networks (RNN) are highly effective models for exploiting word orders in learning useful representations, thanks to the iterative update of the hidden states that depend on both the semantics of the current word and that of historical words, and the long-range dependency made possible through LSTMs (Yang et al, 2016; Stephen et al, 2018; Adhikari et al, 2019)

  • In order to solve these challenges with positional encoding, we explore a new architecture in combining the semantic and temporal information in document classification, called “cascaded semantic and positional self-attention network” (CSPAN)

Read more

Summary

Introduction

Document classification is one of the fundamental problems in natural language processing, which is aimed at assigning one or multiple labels to a (typically) short text paragraph. CNNs have gained huge success in image procesing and classification and were recently introduced to NLP domains like document classification (Zhang et al, 2015; Lei et al.,2015; Conneau et al, 2016; Kim and Yang, 2018; Kim, 2014).The local convolutional operator is sensitive to word orders but only partially and limited by the size of the kernel, and so long-term relations may need many layers and be challenging. Transformers, different from both, fully exploit the modelling power of self-attention mechanism (Shen et al, 2018; Gao et al, 2018; Zheng et al, 2018) and have significantly improved state of the art in many NLP tasks such as machine translation (Vaswani et al, 2017), Findings of the Association for Computational Linguistics: EMNLP 2020, pages 669–677 November 16 - 20, 2020. Transformers, different from both, fully exploit the modelling power of self-attention mechanism (Shen et al, 2018; Gao et al, 2018; Zheng et al, 2018) and have significantly improved state of the art in many NLP tasks such as machine translation (Vaswani et al, 2017), Findings of the Association for Computational Linguistics: EMNLP 2020, pages 669–677 November 16 - 20, 2020. c 2020 Association for Computational Linguistics language understanding (Devlin et al, 2018) and language modeling (Dai et al, 2019), etc

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.