Abstract

With the continued slowing of Moore?s law and Dennard scaling, it has become more imperative that hardware designers make the best use of domain-specific information to improve designs. Gone are the days when we could rely primarily on silicon process technology improvements to provide faster and more efficient computation. Instead, architectural improvements are necessary to provide improved performance, power reduction, and/or reduced cost. Nowhere is this more apparent than when looking at Deep Learning workloads. Cutting-edge techniques achieving state-of-the-art training accuracy demand ever-larger training data-sets and more-complex network topologies, which results in longer training times. At the same time, after training these networks, we expect them to be deployed widely. As a result, executing large networks efficiently becomes critical, whether that execution is done in a data center or in an embedded system. In this article, we look at trends in deep learning research that present new opportunities for domain-specific hardware architectures and explore how next-generation compilation tools might support them.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.