Neural Architecture Search (NAS) has garnered significant attention for its ability to automatically design high-quality deep neural networks (DNNs) tailored to various hardware platforms. The major challenge for NAS is the time-consuming network estimation process required to select optimal networks from a large pool of candidates. Rather than training each candidate from scratch, recent one-shot NAS methods accelerate the estimation process by only training a supernet and sampling sub-networks from it, inheriting partial network architectures and weights. Despite significant acceleration, the supernet training with a large search space (i.e., the number of candidate sub-networks) still requires thousands of GPU hours to support high-quality sub-network sampling. In this work, we propose SparseNAS, an approach for one-shot NAS acceleration by reducing the redundancy of the search space. We observe that many sub-networks in the space are underperforming, with significant performance disparity to high-performance sub-networks. Crucially, this disparity can be observed early in the beginning of the supernet training. Therefore, we train an early predictor to learn this disparity and filter out high-quality networks in advance. Then, the supernet training will be conducted in this space sub-space. Compared to the state-of-the-art one-shot NAS, our SparseNAS reports a 3.1× training speedup with comparable network performance on the ImageNet dataset. Compared to the state-of-the-art acceleration method, SparseNAS reports a maximum of 1.5% higher Top-1 accuracy and 28% training cost reduction with a 7× bigger search space. Extensive experiment results demonstrated that SparseNAS achieves better trade-offs between efficiency and performance than state-of-the-art one-shot NAS.
Read full abstract