Abstract

We propose a 9-bit floating point format for training image-classifier deep convolutional neural networks. The proposed floating point format has a 5-bit exponent, a 3-bit mantissa with the hidden most significant bit (MSB), and a sign bit. The 9-bit floating point format reduces not only the transistor count of the multiplier in the multipy-accumulate (MAC) unit, but also the data traffic for the forward and backward propagations and the weight update. Both of the reductions realize a power efficient training. To maintain the validation accuracy, the accumulator is implemented with an internal longer-bit-length floating point format while the multiplier accepts the 9-bit format. We examined this format in the training of the AlexNet and the ResNet-50 with the ILSVRC 2012 data set. The trained 9-bit AlexNet and ResNet-50 exhibited the validation accuracy superior to the 16-bit floating point format training by 1.2 % and 0.5 %, respectively. The transistor count in the 9-bit MAC unit is estimated to be reduced by 84% as compared to the 32-bit counterpart.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call