The Evolution of Domain-Specific Computing for Deep Learning

Stephen Neuendorffer,Kristof Denolf,Samuel Bayliss,Alireza Khodamoradi Khodamoradi,Abhishek Kumar Jain

doi:10.1109/mcas.2021.3071629

Abstract

With the continued slowing of Moore?s law and Dennard scaling, it has become more imperative that hardware designers make the best use of domain-specific information to improve designs. Gone are the days when we could rely primarily on silicon process technology improvements to provide faster and more efficient computation. Instead, architectural improvements are necessary to provide improved performance, power reduction, and/or reduced cost. Nowhere is this more apparent than when looking at Deep Learning workloads. Cutting-edge techniques achieving state-of-the-art training accuracy demand ever-larger training data-sets and more-complex network topologies, which results in longer training times. At the same time, after training these networks, we expect them to be deployed widely. As a result, executing large networks efficiently becomes critical, whether that execution is done in a data center or in an embedded system. In this article, we look at trends in deep learning research that present new opportunities for domain-specific hardware architectures and explore how next-generation compilation tools might support them.

Full Text