Abstract

Modeling the architecture search process on a supernet and applying a differentiable method to find the importance of architecture are among the leading tools for differentiable neural architectures search (DARTS). One fundamental problem in DARTS is how to discretize or select a single-path architecture from the pretrained one-shot architecture. Previous approaches mainly exploit heuristic or progressive search methods for discretization and selection, which are not efficient and easily trapped by local optimizations. To address these issues, we formulate the task of finding a proper single-path architecture as an architecture game among the edges and operations with the strategies "keep" and "drop" and show that the optimal one-shot architecture is a Nash equilibrium of the architecture game. Then, we propose a novel and effective approach for discretizing and selecting a proper single-path architecture, which is based on extracting the single-path architecture that associates the maximal coefficient of the Nash equilibrium with the strategy "keep" in the architecture game. To further improve the efficiency, we employ a mechanism of entangled Gaussian representation of mini-batches, inspired by the classic Parrondo's paradox. If some mini-batch formed uncompetitive strategies, the entanglement of mini-batches would ensure the games be combined and, thus, turn into strong ones. We conduct extensive experiments on benchmark datasets and demonstrate that our approach is significantly faster than the state-of-the-art progressive discretizing methods while maintaining competitive performance with higher maximum accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call