Abstract
Stream-based computing such as stochastic computing has been used in recent years to create designs with significantly smaller area by harnessing unary encoding of data. However, the area saving comes at an exponential price in latency, making the area × delay cost unattractive. In this article, we present a novel method which uses a hybrid binary / unary representation to perform computations. We first divide the input range into a few sub-regions, perform unary computations on each sub-region individually, and finally pack the outputs of all sub-regions back to compact binary. Moreover, we propose a synthesis methodology and a regression model to predict an optimal or close-to-optimal design in the design space. To the best of our knowledge, we are the first to show a scalable method based on parallel bit-stream data representation that can beat conventional binary in terms of a real cost, i.e., area × delay and energy consumption in almost all functions that we tried at resolutions of 8-, 10-, and 12-bits. Our method outperforms the binary, stochastic, and fully unary methods on a number of functions, especially low-cost binary CORDIC-based functions, and on a common edge detection algorithm on FPGA and in ASIC implementation. In terms of area × delay cost, our {on FPGA, in ASIC} cost is on average only { $4.72\%$ 4 . 72 % , $24.36\%$ 24 . 36 % } and { $20.16\%$ 20 . 16 % , $60.12\%$ 60 . 12 % } of the parallel binary pipeline implementation at 8- and 10-bit resolution, respectively. These numbers are 2–3 orders of magnitude better than the results of traditional stochastic methods. Our method is not competitive with the parallel CORDIC-based pipeline binary method for high-resolution (12-bit), highly oscillating functions such as $\sin (15 x)$ sin ( 15 x ) . However, for complex functions like $gamma$ g a m m a function, the proposed method can beat any other methods in terms of area × delay, throughput, latency, and energy per sample costs. To implement the Roberts cross edge detection algorithm, the proposed method takes 5.7 and 39.45 percent of the area × delay cost of FPGA and ASIC implementation of the binary method, respectively. In terms of energy efficiency for FPGA implementation, our method uses only 8.4, 12.7, and 27.7 percent of the energy per sample usage of serial binary implementations at 8-, 10-, and 12-bit resolutions, respectively. These numbers change to 23.9, 38.54, and 99.3 percent compared to parallel binary implementations.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.