Towards a Truly Integrated Vector Processing Unit for Memory-bound Applications Based on a Cost-competitive Computational SRAM Design Solution

Maha Kooli,Valentin Egloff,Jean-Philippe Noel,Henri-Pierre Charles,Bastien Giraud,Roman Gauchi,Kevin Mambu,Mona Ezzadeen,Antoine Heraud

doi:10.1145/3485823

Abstract

This article presents Computational SRAM (C-SRAM) solution combining In- and Near-Memory Computing approaches. It allows performing arithmetic, logic, and complex memory operations inside or next to the memory without transferring data over the system bus, leading to significant energy reduction. Operations are performed on large vectors of data occupying the entire physical row of C-SRAM array, leading to high performance gains. We introduce the C-SRAM solution in this article as an integrated vector processing unit to be used by a scalar processor as an energy-efficient and high performing co-processor. We detail the C-SRAM system design on different levels: (i) circuit design and silicon proof of concept, (ii) system interface and instruction set architecture, and (iii) high-level software programming and simulation. Experimental results on two complete memory-bound applications, AES and MobileNetV2, show that the C-SRAM implementation achieves up to 70× timing speedup and 37× energy reduction compared to scalar architecture, and up to 17× timing speedup and 5× energy reduction compared to SIMD architecture.

Full Text