Hardware Implementation of Approximate Fixed-point Divider for Machine Learning Optimization Algorithm

Gandong Han,Weiyi Zhang,Zhihua Wang,Chun Zhang,Ziqiang Wang,Liting Niu

doi:10.1109/primeasia56064.2022.10104001

Abstract

Division operation is necessary for many applications, especially optimization algorithms for machine learning. Usually, a certain degree of loss is acceptable in calculating nonsignificant intermediate variables for a considerable speed improvement. This paper proposes a specialized divider to accelerate machine learning optimization algorithm implementation on hardware. Inspired by the fast inverse square root algorithm, we designed a hardware implementation method according to the algorithm, which generates an approximate division result with conversion between floating-point and fixed-point numbers and multiplication. This paper includes three versions of divider: fastDiv_accuracy, a conventional design with a 35% less delay and minimal error compared to delay-minimized standard divider from the Synopsys DesignWare library; fastDiv_area, an area-oriented design with a 67% less delay and acceptable error compared to the standard divider constrained to the same area size; fastDiv_speed, the fastest design with a 54% less delay compared to delay-minimized standard divider. All these three versions can be applied in deploying optimization algorithms in FPGA or ASIC design on demand.

Full Text