Abstract
Architecture search is the automatic process of designing the model or cell structure that is optimal for the given dataset or task. Recently, this approach has shown good improvements in terms of performance (tested on language modeling and image classification) with reasonable training speed using a weight sharing-based approach called Efficient Neural Architecture Search (ENAS). In this work, we propose a novel architecture search algorithm called Flexible and Expressible Neural Architecture Search (FENAS), with more flexible and expressible search space than ENAS, in terms of more activation functions, input edges, and atomic operations. Also, our FENAS approach is able to reproduce the well-known LSTM and GRU architectures (unlike ENAS), and is also able to initialize with them for finding architectures more efficiently. We explore this extended search space via evolutionary search and show that FENAS performs significantly better on several popular text classification tasks and performs similar to ENAS on standard language model benchmark. Further, we present ablations and analyses on our FENAS approach.
Highlights
Architecture search enables automatic ways of finding the best model architecture and cell structures for the given task or dataset, as opposed to the traditional approach of manually tuning among different architecture choices
Comparing our Flexible and Expressible Neural Architecture Search (FENAS) approach with the previous Neural architecture search (NAS) approaches, FENAS performs on Penn Treebank (PTB) and significantly better on several downstream GLUE tasks
FENAS search space is larger than Efficient Neural Architecture Search (ENAS) because of more activation functions and more inputs to the computational nodes
Summary
Architecture search enables automatic ways of finding the best model architecture and cell structures for the given task or dataset, as opposed to the traditional approach of manually tuning among different architecture choices. This idea has been successfully applied to the tasks of language modeling and image classification (Zoph and Le, 2017; Zoph et al, 2018; Cai et al, 2018; Liu et al, 2018a,b). The first approach of architecture search involved an RNN controller which samples a model architecture and uses the validation performance of this architecture trained on the given dataset as feedback (or reward) to sample the architecx[t] hh[[t-t1]] h[t] x[t] h[t-1] tanh ReLU (1) (2) add h[t] (3) Node 1 x[t] h[t-1] tanh x[t]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.