Abstract

Neural architecture search (NAS) attracts much research attention, contributing to its ability to identify better architectures than manually-designed ones. Recently, differential neural architecture search methods have been widely used due to their impressive effectiveness and performance. They represent the network architecture as a repetitive proxy-directed acyclic graph (DAG) and optimize the network weights and architecture weights alternatively in a differential manner. However, existing methods model the architecture weights on each edge (i.e., a layer in the network) as statistically independent variables, ignoring the dependency between edges in DAG induced by their directed topological connections. In this paper, we make the first attempt to investigate such a dependency or relationship by proposing a novel inter-layer transition NAS method. It casts the architecture optimization into a sequential decision process where the dependency between the architecture weights of connected edges is explicitly modeled. Specifically, edges are divided into inner and outer groups according to whether or not their predecessor edges are in the same cell. While the architecture weights of outer edges are optimized independently, those of inner edges are derived sequentially based on the architecture weights of their predecessor edges and the learnable transition matrices in an attentive probability transition manner. Experiments on five benchmark classification datasets, four searching spaces, and NAS-Bench-201 confirm the value of modeling inter-layer dependency and demonstrate the proposed method outperforms other methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call