Abstract

AI evolution is accelerating and Deep Neural Network (DNN) inference accelerators are at the forefront of ad hoc architectures that are evolving to support the immense throughput required for AI computation. However, much more energy efficient design paradigms are inevitable to realize the complete potential of AI evolution and curtail energy consumption. The Near-Threshold Computing (NTC) design paradigm can serve as the best candidate for providing the required energy efficiency. However, NTC operation is plagued with ample performance and reliability concerns arising from the timing errors. In this paper, we dive deep into DNN architecture to uncover some unique challenges and opportunities for operation in the NTC paradigm. By performing rigorous simulations in TPU systolic array, we reveal the severity of timing errors and its impact on inference accuracy at NTC. We analyze various attributes—such as data–delay relationship, delay disparity within arithmetic units, utilization pattern, hardware homogeneity, workload characteristics—and uncover unique localized and global techniques to deal with the timing errors in NTC.

Highlights

  • The proliferation of artificial intelligence (AI) is predicted to contribute up to $15.7 trillion to the global economy by 2030 [1]

  • We illustrate the unique challenges coming from the Deep Neural Network (DNN) workload and attributes of the occurrence and impact of timing errors

  • We discover that Near-Threshold Computing (NTC) DNN accelerators are challenged by landslide increases in the rate timing errors and the inherent algorithmic tolerance of DNNs to timing errors is quickly surpassed with a sharp decline in the inference accuracy

Read more

Summary

Introduction

The proliferation of artificial intelligence (AI) is predicted to contribute up to $15.7 trillion to the global economy by 2030 [1]. NTC operation is prone to a very high sensitivity to process and environmental variations, resulting in excessive increase in delay and delay variation [14] This slows down the performance and induces high rate of timing errors in the DNN accelerator. The unique challenges come from the unique nature of DNN algorithms operating in the complex interleaving among its very large number of dataflow pipelines Amidst these challenges, the homogeneous repetition of the basic functional units throughout the architecture bestows unique opportunities to deal with the timing errors. We propose the opportunities in NTC DNN accelerators to deal with the timing errors, by delving deep into the architectural attributes, such as utilization pattern, hardware homogeneity, sensitization delay profile and DNN workload characteristics.

Background
Challenges for NTC DNN Accelerators
Unique Performance Challenge
Timing Error Detection and Handling
Opportunities for NTC DNN Accelerators
Predictive Opportunities
Opportunities from Novel Timing Error Handling
Opportunities from Hardware Utilization Trend
Device Layer
Circuit Layer
Architecture Layer
Related Works
Enhancements around Memory
Enhancements around Architecture
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call