The small floating-point (SFP) multiplier proposed by Xilinx is utilized to implement the convolution neural networks (CNNs). This scheme can balance the resource usage of look-up tables (LUTs) and digital signal processing blocks (DSPs) so that high compute density is achieved on Field Programmable Gate Arrays (FPGAs). In addition, this scheme can quantize the CNNs with several simple scaling operations rather than a lengthy compute intensive retraining process. However, the mantissa field of SFP multiplier is required to be less than or equal to 3-bit, thus significantly restricts the application of this scheme. To figure out this issue, we implement the SFP multiplier in the logarithmic domain such that the multiplication is performed by the addition with the aid of logarithmic and anti-logarithmic converters that is referred to as small logarithmic floating-point (SLFP) multiplier. Compared to SFP multiplier (3-bit mantissa), the proposed SLFP multiplier can support multiple accuracy levels ( <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$3\mathbf {\mathrm {\sim }}5$ </tex-math></inline-formula> -bit mantissa) with a relatively low overhead ( <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$0\mathbf {\mathrm {\sim }} 3\mathbf {\times }$ </tex-math></inline-formula> LUT6s). Moreover, we utilize the look-ahead carry chain to reduce the delay of addition so that the proposed SLFP multiplier can operate at 650MHz. As a result, the latency (1.5ns, <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$1\mathbf {\times }$ </tex-math></inline-formula> clock cycle) and throughput (650MOPS) of proposed SLFP multiplier ( <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$3\mathbf {\mathrm {\sim }}5$ </tex-math></inline-formula> -bit mantissa) are the same as the SFP multiplier (3-bit mantissa). In the end, the implementation of MobileNet proves that the accuracy level of SFP multiplier (3-bit mantissa) is not sufficient, which can be solved by the proposed SLFP multipliers (5-bit mantissa).
Read full abstract