Binary Neural Networks (BNNs) are rapidly gaining remarkable attention due to their superiority in shrinking the model size, which outstandingly mitigates the fundamental “memory wall” bottleneck that is attributed to the existing von-Neumann architectures. This work investigates how principles from approximate computing can be effectively employed to further optimize BNNs. It demonstrates that HW/SW codesign, in which BNNs are either proactively trained in the presence of approximation-induced errors (i.e. design-time optimization) and/or augmented with an appropriate error-mitigation scheme (i.e., run-time optimization), is a key to realize energy-efficient yet robust BNNs. We unveil, for the first time, that although the underlying HW of BNNs can be implemented using simple <monospace xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">XNOR</monospace> gates, the complexity of the required “Popcount” circuit super-linearly grows with the filter kernel size. This largely impacts the area footprint, inference time, energy, and hence it severely constricts the prospective efficiency gains from BNNs. To overcome this challenge, we replace the accurate full adders constructing the Popcount with Majority gates that approximately perform the required additions. Then, our carefully-crafted error-mitigation scheme along with activations tuning considerably minimizes the induced errors. Afterward, abstracted error probabilities are derived and employed during BNN training to obtain approximation-aware BNNs, that are inherently robust against the underlying HW approximation. Differently from the typical approaches, the proposed HW/SW codesign methodology has the merit of allowing a training of the approximate BNN without the need to modify the existing software frameworks (i.e., PyTorch). This is of great importance since existing tools rely on efficient built-in functions that can be difficult and/or inefficient to be modified. An FPGA-based SoC realizing both accurate and approximation-aware BNNs is developed for validating our proposed methodology. With merely a 4.7% loss in the inference accuracy, our HW/SW codesign leads to 64% and 80.2% savings in the area and energy, respectively, at the parity of the latency. Our results are obtained using commercial EDA tool flows employing a commercial 28nm FDSOI technology node.
Read full abstract