AutoNLU: Architecture Search for Sentence and Cross-sentence Attention Modeling with Re-designed Search Space

Wei Zhu

doi:10.1007/978-3-030-88480-2_13

Abstract

The rise of BERT style pre-trained models has significantly improved natural language understanding (NLU) tasks. However, for industrial usage, we still have to rely on more traditional models for efficiency. Thus, in this paper, we present AutoNLU, which is designed for modeling sentence representation and cross-sentence attention in an automatic network architecture search (NAS) manner. We have two main contributions. First, we design a novel and comprehensive search space that consists of encoder operations and aggregator operations, and important design choices. Second, aiming for sentence-pair tasks, we use NAS to automatically model how the representations of two sentences interact with and attend to each other. A reinforcement learning (RL) based search algorithm is enhanced by cross operation and cross layer parameter sharing for efficient and reliable search. Model training is done by distilling knowledge from BERT models. By experimenting on SST-2, RTE, Sci-Tail and CoNLL 2003, we verify that our learned models are better at learning from BERT teachers than other baseline models. Ablation studies on Sci-Tail show that our search space design is valid, and our proposed strategies are helpful for improving the search results (The source code will be made public available.).

Full Text