STRAIT: Self-Test and Self-Recovery for AI Accelerator

Hayoung Lee,Jihye Kim,Jongho Park,Sungho Kang

doi:10.1109/tcad.2023.3236875

Abstract

As the demand for data-intensive analytics has increased with the rapid advance in artificial intelligence (AI), various AI accelerators have been proposed. However, as AI-based solutions have been adapted to applications requiring accuracy and reliability, the reliability of them has become a critical issue. For this reason, self-test and self-recovery for AI accelerator (STRAIT) is proposed in this paper. It facilitates self-test, self-diagnosis, and self-recovery by utilizing the structural and operational characteristics of systolic array in AI accelerator. The proposed self-test is progressed using scan chains composed of functional paths, and can achieve a 100% test coverage (for both stuck-at and transition-delay faults) with small number of test patterns and reduced test power. The proposed self-diagnosis is progressed with the proposed self-test in real-time, and allows accurate fault localization with fault type analysis. The proposed self-recovery is progressed using efficient pruning for faulty processing elements with weight allocation, and the reliability of AI accelerators can drastically increase with negligible performance degradation. However, STRAIT can be implemented with a small area overhead.

Full Text