Abstract
Modern applications rely more and more on the simultaneous execution of multiple DNNs, and Heterogeneous DNN Accelerators (HDAs) prevail as a solution to this trend. In this work, we propose, implement, and evaluate low precision Neural Processing Units (NPUs) which serve as building blocks to construct HDAs, to address the efficient deployment of multi-DNN workloads. Moreover, we design and evaluate HDA designs that increase the overall throughput, while reducing the energy consumption during NN inference. At the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">design time</i> , we implement HDAs inspired by the big.LITTLE computing paradigm, consisting of 8-bit NPUs together with lower precision bit-width NPUs. Additionally, an NN-to-NPU scheduling methodology is implemented to decide at <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">run-time</i> how to map the executed NN to the suitable NPU based on an accuracy drop threshold value. Our <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">hardware/software co-design</i> reduces the energy and response time of NNs by 29% and 10% respectively when compared to state-of-the-art homogeneous architectures. This comes with a negligible accuracy drop of merely 0.5%. Similar to the traditional CPU big.LITTLE, our asymmetric NPU design can open new doors for designing novel DNN accelerator architectures, due to their profound role in increasing the efficiency of DNNs with minimal losses in accuracy.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.