Reducing energy consumption of Neural Architecture Search: An inference latency prediction framework

Bo Lyu,Longfei Lu

doi:10.1016/j.scs.2021.102747

Abstract

Benefit from the success of NAS (Neural Architecture Search) in deep learning, humans are hopefully been released from the tremendous labor of manual tuning of structure and hyper-parameters. However, the success of NAS comes at the cost of much more computational resource consumption, thousands of times more computational power than ordinary training of manual-designed models, especially for the resource-aware multi-objective NAS, which must be serialized as a sequential loop of sampling, training, deployment, and inference. Recent research has shown that deep learning leads to huge energy consumption and CO2 emission (training of the namely Transformer can emit CO2 as much as five cars in their lifetimes Strubell et al. (2019)). Aiming to alleviate this issue, we propose the end-to-end inference latency prediction framework to empower the NAS process with a direct resource-aware efficiency indicator. Namely, we first propose the end-to-end latency prediction framework, which can predict latency quickly and accurately based on the dataset collected by ourselves. Eventually, we experimentally show that with the encoding scheme we designed, our proposed best model, LSTM-GBDT Latency Predictor(LGLP) achieves an excellent result of 0.9349 MSE, 0.5249 MAE, 0.9842R2, and 0.9925 corrcoef. In other words, our limited dataset and encoding scheme already provide the precise knowledge representation of this large search space. By equipping NAS with the proposed framework, taking NEMO for example, it will save 1588 kWh⋅PUE energy, 1515 pounds CO2 emissions, and $3176 cloud compute cost of AWS. For NAS is now widely exploited in research or industry applications, this will bring incalculable benefits to society and the environment.

Full Text