Enhancing Model Parallelism in Neural Architecture Search for Multidevice System

Cheng Fu,Huili Chen,Jishen Zhao,Yuandong Tian,Zhenheng Yang,Farinaz Koushanfar

doi:10.1109/mm.2020.3004538

Cheng Fu, Huili Chen + Show 4 more

Open Access

https://doi.org/10.1109/mm.2020.3004538

Copy DOI

Journal: IEEE Micro	Publication Date: Sep 1, 2020
Citations: 4	License type: publisher-specific, author manuscript

Affiliation: California Western School of Law, Meta (Israel)

Abstract

Neural architecture search (NAS) finds favorable network topologies for better task performance. Existing hardware-aware NAS techniques only target to reduce inference latency on single CPU/GPU systems and the searched model can hardly be parallelized. To address this issue, we propose ColocNAS, the first synchronization-aware, end-to-end NAS framework that automates the design of parallelizable neural networks for multidevice systems while maintaining a high task accuracy. ColocNAS defines a new search space with elaborated connectivity to reduce device communication and synchronization. ColocNAS consists of three phases: 1) offline latency profiling that constructs a lookup table of inference latency of various networks for online runtime approximation; 2) differentiable latency-aware NAS that simultaneously minimizes inference latency and task error; and 3) reinforcement-learning-based device placement fine-tuning to further reduce the latency of the deployed model. Extensive evaluation corroborates ColocNAS's effectiveness to reduce inference latency while preserving task accuracy.

Full Text