ARETE: Accurate Error Assessment via Machine Learning-Guided Dynamic-Timing Analysis

Ioannis Tsiokanos,Lev Mukhanov,Georgios Karakonstantis,Giorgis Georgakoudis,Styliani Tompazi

doi:10.1109/tc.2022.3191966

Abstract

Nanometer circuits are increasingly prone to timing errors, escalating the need for <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">fault injection</i> frameworks to accurately evaluate their impact on applications. In this paper, we propose ARETE, a novel cross-layer, fault-injection framework that combines dynamic-binary instrumentation with machine learning-guided dynamic-timing analysis. ARETE enables accurate fault-injection into any application by estimating the location of the injecting errors via dynamic-timing analysis. To accelerate fault-injection, we develop a novel, data-aware, machine learning-based mechanism that dynamically pre-selects the error-prone instructions and limits the application of the costly dynamic-timing analysis only to them. To evaluate ARETE's accuracy, our fully automated toolflow is configured to support fault-injection based on detailed post-layout gate-level simulations as well as via existing workload-agnostic error models. Our results for various workloads, including an autonomous-driving library, show that the location and time of injected errors performed by ARETE, is 89.9% consistent with fault-injection based on full gate-level simulation. On average, ARETE executes 84.6× faster than gate-level simulation and at a cost of 3.4% loss in the program output quality estimation. When compared to the existing statistical fault-injection tools that are based on workload-agnostic error models, ARETE improves the accuracy of fault-injection rate and output quality estimation by 143.9% and 40.4% on average, respectively.

Full Text