Topology-Sensitive Neural Architecture Search for Language Modeling

Quan Du,Nuo Xu,Yinqiao Li,Tong Xiao,Jun Zhu

doi:10.1109/access.2021.3101255

Abstract

Recently Neural Architecture Search has drawn interest from researchers because of its ability to learn neural network architectures from data automatically. The differentiable methods are widely used because they can obtain better architectures with less computational resources. However, the method is with a mismatch between training and inference. In training process, we minimize the loss on the expected output of an ensemble of models, whereas we infer a single model at test time. In this paper we present a topology-sensitive approach to neural architecture search. Unlike previous work, we do not resort to the search strategy that works in a different scope of architecture topologies with inference. Instead, the topology of a neural network is explicitly modeled in the training/search process. We experiment with our method on the PTB and WikiText language modeling tasks. On the PTB dataset, it outperforms several strong baselines. Also, the newly-found architecture shows good transferability to named entity recognition and machine translation tasks. In particular, experimental results show our method outperforms the strong baseline by a BLEU score of 0.6 on IWSLT15 English-Vietnamese task and a BLEU score of 0.9 on WMT14 English-German task.

Full Text