Abstract

Owing to the ever-increasing usage of neural networks in various applications from mobile devices to data centers, hardware accelerators for neural networks have been widely studied in recent years. Recently proposed accelerators mainly utilize sparsity and/or reduced precision approach to accelerate neural networks. Since neural networks go deeper to be applied to more complex applications, future hardware accelerators for neural networks may require to utilize both sparsity and very low-precision methods, which however, have not been analyzed quantitatively. In this paper, we first introduce an end-to-end FPGA prototyping flow and apply it to a neural network accelerator which supports both fine-grained zero-skipping and very low-precision. We report our analyses of resource usage and performance by varying bit-width and zero data ratio. We summarize the paper with our lessons learned from our prototyping study on future zero-aware very-low-precision accelerators.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call