Abstract
Industry increasingly adopts Convolutional Neural Networks (CNNs) in applications ranging from IoT to autonomous driving. Convolutional hardware accelerators for the inference phase are an option for CPUs and GPUs due to the smaller power consumption and improved performance. The literature presents hardware accelerators using different 2D architectures, including weight stationary (WS), input stationary (IS), and output stationary (OS). The main differentiation between these architectures is how accelerators access data (input feature map and weight tensors) and compute the output (output feature map tensor). There is a gap in the literature related to a comprehensive evaluation of such architectures. This brief aims to answer the following question: “which accelerator type should I use according to my design constraints and memory type (SRAM or DRAM)”. Experiments show that when using SRAM as external memory to the accelerator, WS presents the smallest area and energy consumption, while IS presents the best performance. On the other hand, the IS accelerator stands out when using DRAM because it has a reduced performance sensitivity to memory latency.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have