Abstract
We present the first programmable and precision-tunable Stochastic Computing (SC) neural network (NN) inference accelerator. The use of SC makes it possible to achieve multiply-accumulate (MAC) density of 38.4k MAC/mm2, enabling a level of spatial data reuse unachievable to conventional, fixed-point architectures. This extensive reuse amortizes the cost of SC conversion and reduces the number of memory accesses, which can otherwise consume significant energy and latency. Our accelerator is a stand-alone architecture, with a custom instruction set architecture (ISA), and support for end-to-end model inference with convolutional and fully-connected layers of variable input and filter sizes. Further, it demonstrates extensive accuracy-latency trade-offs by varying the stream length. The 14nm demonstration chip achieves 2.4 TOPS and 75 TOPS/W peak throughput and energy efficiency, outperforming comparable fixed-point accelerators.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.