Abstract

Convolutional neural networks (CNNs) are essential for advancing the field of artificial intelligence. However, since these networks are highly demanding in terms of memory and computation, implementing CNNs can be challenging. To make CNNs more accessible to energy-constrained devices, researchers are exploring new algorithmic techniques and hardware designs that can reduce memory and computation requirements. In this work, we present self-gating float (SG-Float), algorithm hardware co-design of a novel binary number format, which can significantly reduce memory access and computing power requirements in CNNs. SG-Float is a self-gating format that uses the exponent to self-gate the mantissa to zero, exploiting the characteristic of floating-point that the exponent determines the magnitude of a floating-point value and the error tolerance property of CNNs. SG-Float represents relatively small values using only the exponent, which increases the proportion of ineffective mantissas, corresponding to reducing mantissa multiplications of floating-point numbers. To minimize the accuracy loss caused by the approximation error introduced by SG-Float, we propose a fine-tuning process to determine the exponent thresholds of SG-Float and reclaim the accuracy loss. We also develop a hardware optimization technique, called the SG-Float buffering strategy, to best match SG-Float with CNN accelerators and further reduce memory access. We apply the SG-Float buffering strategy to vector-vector multiplication processing elements (PEs), which NVDLA adopts, in TSMC 40nm technology. Our evaluation results demonstrate that SG-Float can achieve up to 35% reduction in memory access power and up to 54% reduction in computing power compared with AdaptivFloat, a state-of-the-art format, with negligible power and area overhead. Additionally, we show that SG-Float can be combined with neural network pruning methods to further reduce memory access and mantissa multiplications in pruned CNN models. Overall, our work shows that SG-Float is a promising solution to the problem of CNN memory access and computing power.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call