Abstract

Parameter quantization with lower bit width is the common approach to reduce the computation loads in CNN inference. With the parameters being replaced by fixed-width binaries, multiplication operations can be replaced by the look-up-table (LUT), where the multiplier-multiplicand operands serve as the table index, and the precalculated products serve as table elements. Because the histogram profiles of the parameters in different layers/channels differ significantly in CNN, previous LUT-based computation methods have to use different LUTs for each layer/channel, and consequently demand larger memory space along with extra access time and power consumption. In this work, we first normalize the parameters Gaussian profiles of different layers/channels to have similar means and variances, and further quantize the normalized parameters into fixed width through nonlinear quantization. Because of the normalized parameters profile, we can use one single compact LUT (<inline-formula> <tex-math notation="LaTeX">$16\times 16$ </tex-math></inline-formula> entries) to replace all multiplication operations in the whole network. Furthermore, the normalization procedure also reduces the errors induced from quantization. Experiments demonstrate that with a compact 256-entry LUT, we can achieve the accuracy comparable to the results from 32-bit floating-point calculation; while significantly reducing the computation loads and memory spaces, along with power consumption and hardware resources.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call