Abstract
We present a technique that generates high-quality functional in-field self-tests specifically targeting deep learning (DL) accelerators. These functional tests can be applied in the field during normal operation of a DL accelerator, which is crucial to ensure that the safety and/or reliability requirements are met for any given application, including safety-critical applications such as self-driving cars, robotics, and more.Our technique takes advantage of special architectural characteristics and application properties to achieve high functional test coverage while incurring minimal system-level costs. Moreover, we devise different strategies for the compute units (which support computation operations) and the control units (which control data movement) because these two types of units exhibit different properties. For the compute units of a DL accelerator, we first use combinational ATPG to generate test patterns with high test coverage, which is possible because these units do not contain complex sequential logic. Next, we map the ATPG patterns to one or more equivalent deep neural networks (DNNs) that can be directly executed on the accelerator, which is possible given the well-defined dataflow/reuse algorithm of a DL accelerator. For the control units, we leverage the property that typically only one or a few fixed DNNs are deployed at a time in many application domains (e.g., self-driving cars). Thus, it is sufficient to target only the faults that can directly affect the correctness of the DNNs that are currently deployed. This is done by executing different layers of each target DNN using carefully-crafted input and weight values to maximize test coverage while minimizing test time.We apply our technique using Nvidia’s open-source accelerator as a case study to demonstrate its efficacy. Our results show that our technique achieves high test coverage. For the compute units, 99.9% single stuck-at functional test coverage is achieved. For the control units, we are able to prove that, given any target DNN, 100% coverage can be achieved for a large class of single and multiple fault models. The in-field functional self-test time is also very low, < 17 ms for various representative DNNs. These functional tests can be applied during boot-up, reset, and even concurrently with normal operation by executing DNN test programs directly on the accelerator, without requiring any test support in the hardware.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have