EXTREM-EDGE—EXtensions To RISC-V for Energy-efficient ML inference at the EDGE of IoT

Vaibhav Verma,Tommy Tracy Ii,Mircea R Stan

doi:10.1016/j.suscom.2022.100742

Abstract

Artificial intelligence (AI) and machine learning (ML) have emerged as the fastest growing workloads ranging from applications like object detection, natural language processing and facial recognition to self-driving cars. The proliferation of these compute-intensive workloads resulted in numerous hardware accelerators to fill the gap between the performance and energy-efficiency requirements of AI applications and the capabilities of current architectures like CPU and GPU. In most cases these accelerators are specialized for a particular task, are costly to produce, require special programming tools, and can become obsolete as new ML algorithms are introduced. To solve these problems, we present EXTREM-EDGE, a hardware/software co-design approach to add custom extensions to the open-source RISC-V Instruction Set architecture (ISA) for designing a scalable and flexible ML processor architecture. EXTREM-EDGE augments the RISC-V processor with hardware AI functional units (AFU) along with ISA extensions which directly target these AFUs. EXTREM-EDGE is a system-level solution which is easy to program, enables royalty-free production and provides flexibility for future workloads. It enables the designers to quickly adapt to any hardware or ISA/software changes and allows the design-space exploration of various available hardware, instructions and software options. This enables a processor architecture which addresses the requirements of current AI/ML workloads, gives the flexibility to hot-swap AFUs when better hardware is available and scales with new AI instructions in response to rapidly evolving AI algorithms while providing a streamlined development flow for both hardware and software. EXTREM-EDGE provides 1.75x (MAC) to 17.63x (PIM VMM) performance improvements for a GEMV kernel and 1.41x (MAC) to 4.41x (PIM VMM) reductions in processor clock cycles for ResNet-8 neural network model from MLPerf Tiny benchmark depending upon the size of added accelerators and complexity of added instructions.

Full Text