Abstract

Given the popularity of decimal arithmetic, hardware implementation of decimal operations has been a hot topic of research in recent decades. Besides the four basic operations, the square root can be implemented as an instruction directly in the hardware, which improves the performance of the decimal floating-point unit in the processors. Hardware implementation of decimal square rooters is usually done using either functional or digit-recurrence algorithms. Functional algorithms, entailing multiplication per iteration, seem inadequate to use for decimal square roots, given the high cost of decimal multipliers. On the other hand, digit-recurrence square root algorithms, particularly SRT (this method is named after its creators, Sweeney, Robertson, and Tocher) algorithms, are simple and well suited for decimal arithmetic. This paper, with the intention of reducing the latency of the decimal square root operation while maintaining a reasonable cost, proposes an SRT algorithm and the corresponding hardware architecture to compute the decimal square root. The proposed fixed-point square root design requires n+3 cycles to compute an n-digit root; the synthesis results show an area cost of about 31K NAND2 and a cycle time of 40 FO4. These results reveal the 14 % speed advantage of the proposed decimal square root architecture over the fastest previous work (which uses a functional algorithm) with about a quarter of the area.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.