From Algorithm to Module: Adaptive and Energy-Efficient Quantization Method for Edge Artificial Intelligence in IoT Society

Tao Li,Tetsuo Endoh,Yitao Ma

doi:10.1109/tii.2022.3223222

Abstract

Next-generation industrial edge artificial intelligence (AI) applications will undoubtedly emerge on energy-efficient, highly-integrated platforms incorporating various sensors, processors, and functional modules. Effectively bridging sensor data captured from a microprocessor or central processing unit with dedicated processing modules is still being studied in academia and industry. Therefore, embedding the data quantization module onto edge AI chips, connecting sensors, processors, and functional modules, is critical to achieving adaptive transformation of diverse data representation formats. This paper proposes a novel adaptive and low-power quantization technique and systematically validates its effectiveness from algorithm to hardware module for industrial IoT applications, covering precise navigation for autonomous vehicles and accurate classification utilizing deep neural networks (DNNs). The proposed quantization method merges an adaptive conversion function from floating-point to fixed-point binaries and an adaptive radix-point determination function, ensuring adequate resolution and minimal error loss of the fixed-point inputs to the edge AI modules. The experimental results demonstrate that the quantization error in the proposed quantization technique contributes negligible errors to the navigation solutions of the strapdown inertial navigation system and the DNNs' top-1 and top-5 classification accuracy (on the order of 10 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$^{-8}$</tex-math></inline-formula> and 10 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$^{-7}$</tex-math></inline-formula> ). Moreover, a quantization-on-multiplier (QoM) hardware module is designed, synthesized, and routed in accordance with the proposed quantization technique. The simulation results indicate that the QoM's power consumption and area are 0.1 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$mW$</tex-math></inline-formula> and 649.552 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\mu m^{2}$</tex-math></inline-formula> , accounting for 5% and 14.15% of its total energy consumption and area, respectively. With our proposed on-chip quantization technique, the time required to quantize the parameters of DNNs is up to 1142 times shorter than with the existing benchmark off-chip quantization approaches.

Full Text