To satisfy the massive computational requirement of Convolutional Neural Networks, various Domain-Specific Architecture based accelerators have been deployed in large-scale systems. While improving the performance significantly, the high integration of the accelerator makes it much more susceptible to soft-error, which will be propagated and amplified layer by layer during the execution of CNN, finally disturbing the decision of CNN and leading to catastrophic consequences. CNNs have been increasingly deployed in security-critical areas, requiring more attention to reliable execution. Although the classical fault-tolerant approaches are error-effective, the performance/energy overheads introduced are non-negligible, which is the opposite of CNN accelerator design philosophy. In this article, we leverage CNN's intrinsic tolerance for minor errors and the similarity of filters within a layer to explore the Approximate Fault Tolerance opportunities for CNN accelerator fault tolerance overhead reduction. By gathering the filters into several check groups by clustering to perform an inexact check while ensuring that serious errors are mitigated, our approximate fault tolerance design can reduce fault tolerance overhead significantly. Furthermore, we remap the filters to match the checking process and the dataflow of systolic array, which can satisfy the real-time checking demands of CNN. Experimental results exhibit that our approach can reduce 73.39%performance degradation of baseline DMR.
Read full abstract