Abstract

Deep learning, as a significant class of machine learning, is a rapidly expanding research field with immense potential. Particularly, deep neural networks (DNN), spawning millions of applications including image and speech recognition, natural language processing, have gained global research attention. A DNN is an artificial neural network with multiple layers between input and output layers. The Softmax function is often used in the final layer of DNN-based classifier. Softmax function contains massive exponential and division operations, resulting in high resource usage when implemented as hardware. In this paper we present an efficient hardware implementation of Softmax function with 16-bit fixed-point input and output. During Softmax calculation, we exploit the combination of lookup table and multi-segment linear fitting to handle the exponential operations of integer and fractional parts, respectively. Furthermore, we adopt radix-4 Booth-Wallace-based 6-stage pipeline multiplier and modified shift-compare divider for high efficiency. The overall architecture features a 13-stage pipelined design, in order to improve the operating frequency. Our proposed FPGA implementation attains the precision of magnitude of 10 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">-5</sup> , and the frequencies of FPGA and ASIC implementation (45nm technology) reach 396.040MHz and 3.3GHz, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call