Efficient AI System Design With Cross-Layer Approximate Computing

Swagath Venkataramani,George Gristede,Ching Zhou,Naigang Wang,Gary Maier,Vijayalakshmi Srinivasan,Eri Ogawa,Marcel Schaal,Brian Curran,Bruce Fleischer,Hiroshi Inoue,Fanchieh Yee,Kazuaki Ishizaki,Howard Haynie,Jungwook Choi,Kailash Gopalakrishnan,Jinwook Oh,Shubham Jain,Chia-Yu Chen,Wei Wang,Tina Babinsky,Christos Vezyrtzis,Ankur Agarwal,Xiao Sun,Shih-Hsien Lo,Sunil Shukla,Leland Chang,Mingu Kang,Jintao Zhang,Silvia Melitta Mueller ,Nianwen Cao ,Maureen O’hara ,Michael J Klaiber ,M Scheuermann ,Mauricio J Serrano ,Thomas W Fox ,Matthew M Ziegler ,Pong‐Fei Lu ,Michael A Guillorn ,J A Silberman

doi:10.1109/jproc.2020.3029453

Abstract

Advances in deep neural networks (DNNs) and the availability of massive real-world data have enabled superhuman levels of accuracy on many AI tasks and ushered the explosive growth of AI workloads across the spectrum of computing devices. However, their superior accuracy comes at a high computational cost, which necessitates approaches beyond traditional computing paradigms to improve their operational efficiency. Leveraging the application-level insight of error resilience, we demonstrate how approximate computing (AxC) can significantly boost the efficiency of AI platforms and play a pivotal role in the broader adoption of AI-based applications and services. To this end, we present RaPiD, a multi-tera operations per second (TOPS) AI hardware accelerator core (fabricated at 14-nm technology) that we built from the ground-up using AxC techniques across the stack including algorithms, architecture, programmability, and hardware. We highlight the workload-guided systematic explorations of AxC techniques for AI, including custom number representations, quantization/pruning methodologies, mixed-precision architecture design, instruction sets, and compiler technologies with quality programmability, employed in the RaPiD accelerator.

Full Text