Abstract

Computational environment of Deep Learning Neural Networks (DLNNs) is considerably different than that of Conventional computer systems. DLNNs require thousands, if not millions of compute cores compared to one or few in conventional systems. Therefore, there is a need to review the performance issue to gain better understanding of how systems behave in such massively parallel architectures. Precision, speed, memory access, bus contention, resource sharing, chip area etc are some of the key issues that need to be studied in the changed context. Low precision multiplication remains one of the commonly used operations in neural computations. This paper draws reader attention to some interesting results in area-speed tradeoffs when applied to massively parallel architectures. A new low precision fixed point representation is discussed. A hardware accelerators and its software components used in the simulation are briefly discussed. Results show that serial multipliers can perform better than parallel multipliers considering the throughput per unit area of Silicon.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call