Abstract

Deep learning has become an increasingly heated topic as artificial intelligence is on the rise. At the same time, hardware restrictions in real applications have driven the investigation of combining another rising technique, stochastic computing (SC), with deep learning systems to achieve low power costs. By far, operations successfully implemented include addition, multiplication, inner product, and other more complicated nonlinear functions such as hyperbolic tangent (tanh) with linear finite state machines (FSM). The inner product implementation realizes convolution, a core of neural networks, therefore encouraging SC- based deep learning neural network implementations. Meanwhile, extremely long bitstream lengths are needed to achieve satisfying accuracy, especially for large-scale deep learning systems, causing latency issues. The integration of parallelism is thus considered in an attempt to alleviate the latency issue. In this paper, an optimization to stochastic computing based deep learning system is proposed by introducing parallel FSM implementations to replace serial ones generally used in previous works. Substituting serial linear FSMs with several parallel linear FSMs of the same size yet with shorter bitstream length, parallel FSM aims at trading hardware for processing latency. The accuracy performance of a sample parallel FSM unit is evaluated against its counterpart in serial implementation before a case study verifies that the replacement sacrifices little accuracy, while reducing computing time exponentially in actual deep learning system realizations.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call