Abstract
In this paper, we address the problem of neural architecture search (NAS) in a context where the optimality policy is driven by a black-box Oracle $\mathcal{O}$ with unknown form and derivatives. In this scenario, $\mathcal{O}(A_{C})$ typically provides readings from a set of sensors on how a neural network architecture $A_{C}$ fares in a target hardware, including its: power consumption, working temperature, $\mathbf{cpu}/\mathbf{gpu}$ usage, central bus occupancy, and more. Current differentiable NAS approaches fail in this problem context due to lack of access to derivatives, whereas traditional reinforcement learning NAS approaches remain too expensive computationally. As solution, we propose a reinforcement learning NAS strategy based on policy gradient with increasingly sparse rewards. We rely on the fact [1] that one does not need to fully train the weights of two neural networks to compare them. Our solution starts by comparing architecture candidates with almost fixed weights and no training, and progressively shifts toward comparisons under full weights training. Experimental results confirmed both the accuracy and training efficiency of our solution, as well as its compliance with soft/hard constraints imposed on the sensors feedback. Our strategy allows finding near-optimal architectures significantly faster, in approximately 1/3 of the time it would take otherwise.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.