Facilitating hardware-aware neural architecture search with learning-based predictive models

Xueying Wang,Guangli Li,Xiu Ma,Xiaobing Feng

doi:10.1016/j.sysarc.2023.102838

Abstract

Neural architecture search (NAS), which automatically explores the efficient model design, has achieved ground-breaking advances in recent years. To achieve the optimal model latency on deployment platforms, a performance tuning process is usually needed to select reasonable parameters and implementations for each neural network operator. As the tuning process is time-consuming, it is impractical for tuning each candidate architecture generated in the search procedure. Recent NAS systems usually utilize theoretical metrics or rule-based heuristics on-device latency to approximately estimate the model performance. Nevertheless, we discovered that there is still a gap between the estimated latency and the optimal latency, potentially causing a sub-optimal solution for neural architecture search. This paper presents an accurate and efficient approach for estimating the practical model latency on target platforms, which employs lightweight learning-based predictive models (LBPMs) to assist to obtain the realistic deployment-time model latency with acceptable run-time overhead, thereby facilitating hardware-aware neural architecture search. We propose an LBPM-based NAS framework, LBPM-NAS, and evaluate it by searching model architectures for ImageNet classification and facial landmark localization tasks on various hardware platforms. Experimental results show that the LBPM-NAS achieves up to 2.4× performance boost compared with the baselines under the same-level accuracy.

Full Text