Optimal Architecture of Floating-Point Arithmetic for Neural Network Training Processors.

Muhammad Junaid,Taegeon Lee,Saad Arslan,Hyungwon Kim

doi:10.3390/s22031230

Muhammad Junaid, Taegeon Lee + Show 2 more

Open Access

https://doi.org/10.3390/s22031230

Copy DOI

Abstract

The convergence of artificial intelligence (AI) is one of the critical technologies in the recent fourth industrial revolution. The AIoT (Artificial Intelligence Internet of Things) is expected to be a solution that aids rapid and secure data processing. While the success of AIoT demanded low-power neural network processors, most of the recent research has been focused on accelerator designs only for inference. The growing interest in self-supervised and semi-supervised learning now calls for processors offloading the training process in addition to the inference process. Incorporating training with high accuracy goals requires the use of floating-point operators. The higher precision floating-point arithmetic architectures in neural networks tend to consume a large area and energy. Consequently, an energy-efficient/compact accelerator is required. The proposed architecture incorporates training in 32 bits, 24 bits, 16 bits, and mixed precisions to find the optimal floating-point format for low power and smaller-sized edge device. The proposed accelerator engines have been verified on FPGA for both inference and training of the MNIST image dataset. The combination of 24-bit custom FP format with 16-bit Brain FP has achieved an accuracy of more than 93%. ASIC implementation of this optimized mixed-precision accelerator using TSMC 65nm reveals an active area of 1.036 × 1.036 mm2 and energy consumption of 4.445 µJ per training of one image. Compared with 32-bit architecture, the size and the energy are reduced by 4.7 and 3.91 times, respectively. Therefore, the CNN structure using floating-point numbers with an optimized data path will significantly contribute to developing the AIoT field that requires a small area, low energy, and high accuracy.

Highlights

The Internet of Things (IoT) is a core technology leading the fourth industrial revolution through the convergence and integration of various advanced technologies
Most of the deep neural network training models are still based on the backpropagaMost of the deep neural network training models are still based on the backpropagation algorithm, which propagates the errors from the output layer backward and updates tion algorithm, which propagates the errors from the output layer backward and updates the variables layer by layer with the gradient descent-based optimization algorithms
This paper evaluated different floating-point formats and optimized the FP operators in the Convolutional Neural Network Training/Inference engine

Summary

Introduction

The Internet of Things (IoT) is a core technology leading the fourth industrial revolution through the convergence and integration of various advanced technologies. Conventional neural network circuit design studies have been conducted using floating-point operations provided by GPUs or fixed-point computation hardware [27,28]. Most of the existing floating-pointbased neural networks are limited to inference operation, and only a few incorporate training engines that are aimed at high-speed servers, not low-power mobile devices. This paper evaluates different floating-point formats and their combinations to implement FP operators, providing accurate results with less consumption of resources. We have implemented a circuit that infers accuracy using CNN (convolutional neural network) and a floating-point training circuit. Sensors 2022, 22, 1230 have implemented a circuit that infers accuracy using CNN (convolutional neural network) and a floating-point training circuit.

SoftMax module

Gradient

General Floating-Point Number and Arithmetic

Variants of Floating-Point Number Formats

Division

Division calculation using Signed Array

Structure

Floating Point Multiplier

10 FC1 Weight

Overall architecture the proposed

CNN Structure Optimization

Comparison of Floating-Point Arithmetic Operators

Evaluation of the Proposed CNN Training Accelerator

Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Sensors	Publication Date: Feb 6, 2022
Citations: 7	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Optimal Architecture of Floating-Point Arithmetic for Neural Network Training Processors.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors

Lead the way for us

Similar Papers

Getting AI Right: Introductory Notes on AI & Society
James Manyika
Daedalus | VOL. 151
James ManyikaJames Manyika
01 May 2022
Daedalus | VOL. 151

Convergence of blockchain and artificial intelligence in IoT network for the sustainable smart city
Saurabh Singh ... In-Ho Ra
Sustainable Cities and Society | VOL. 63
Saurabh Singh, et. al.Saurabh Singh ... In-Ho Ra
01 Jul 2020
Sustainable Cities and Society | VOL. 63

Interdisciplinary Insights: The Convergence of AI and Educational Leadership
Renigunta Satya ... Lubna Ali Mohammed
International Journal of Research and Innovation in Social Science | VOL. 8
Renigunta Satya, et. al.Renigunta Satya ... Lubna Ali Mohammed
01 Jan 2024
International Journal of Research and Innovation in Social Science | VOL. 8

Emergence of New Disease: How Can Artificial Intelligence Help?
Yurim Park ... Feng Cheng
Trends in Molecular Medicine | VOL. 26
Yurim Park, et. al.Yurim Park ... Feng Cheng
03 May 2020
Trends in Molecular Medicine | VOL. 26

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Optimal Architecture of Floating-Point Arithmetic for Neural Network Training Processors.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors