Abstract

Recently, neural architecture search (NAS) has emerged as a technique of growing concern in automatic machine learning (AutoML). Meanwhile, attention-based models, such as attention-based recurrent neural network, transformer-based model, etc., have been widely used in deep learning applications. However, there is no efficient NAS method that can search the architecture of attention-based model so far. To solve this problem, we propose a framework named neural architecture search for attention-based networks (NASABN) by abstracting attention-based models and extracting undefined parts of the model, including the attention layers and cells. NASABN is flexible and general enough to fit different NAS methods, which can also be transferred across different datasets. We conduct extensive experiments with NASABN using gradient descent-based methods like DARTS on Penn Treebank (PTB) and WikiText-2 (WT2) datasets respectively, and achieve competitive performance compared with the state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call