UCB-ENAS based on Reinforcement Learning

Song Xue,Runqi Wang,Bo Zhao,Hanlin Chen,Baochang Zhang

doi:10.1109/iciea51954.2021.9516113

Abstract

Deep learning has achieved good results in many practical applications, but the network architecture is largely dependent on manual design. In order to liberate the network architecture from manual design, the Neural Architecture Search (NAS) came into being. NAS is mainly divided into three parts: search space, search strategy and performance estimation strategy. Because of the huge search space of NAS, search process becomes extremely long. A good search strategy can search out the high-performance network architecture in a short time. In this paper, we study the search strategy for NAS problems and propose the UCB-ENAS algorithm based on reinforcement learning, which significantly improves search efficiency in a flexible manner. NAS problem can be regarded as a stateless Multi-armed Bandit problem, so we use long short-term memory (LSTM) and Upper Confidence Bounds (UCB) to jointly build a controller that generates a network architecture, and then use the policy-based REINFORCE algorithm to update the controller parameters to maximize the expected reward. Controller parameters and model parameters are alternately optimized. A large number of experiments show that the proposed algorithm can quickly and efficiently search the network architecture, which is faster than ENAS in search speed, and the performance is higher than the architecture searched by DARTS (first order). For example: 56.54% perplexity is obtained on the PTB dataset.

Full Text