Transistor scaling has continued to yield significant performance gains. The general-purpose architecture will not suffice in this dark silicon era with the big data boom and growing emphasis on performance. Artificial neural networks and deep learning are sophisticated computing activities that cannot be optimally executed (accuracy, performance, and energy) on today's general-purpose computer architecture. The demand for domain-specific computer architectures is increasing. This paper suggests and evaluates a highly reconfigurable domain-specific design for deep learning and artificial neural networks to overcome these problems. An FPGA-based soft-core processor for deep neural network (DNN) acceleration is suggested in this paper. Using a combination of unicast and multicast for collective operations, the proposed FPGA accelerates the performance of the deep learning network. The suggested design outperforms the current general-purpose processor by 6.6 times, according to a thorough examination and comparison. The existing architecture's level of accuracy is retained.
Read full abstract