In-Memory Computing (IMC) aims at solving the performance gap between CPU and memories introduced by the memory wall. However, general-purpose IMC does not consider the optimization of data transfers for patterns such as stencils and convolutions. This paper proposes a new Instruction Set Architecture (ISA) and a novel pattern encoding for IMC to transfer and organize data streams in order to perform efficiently computation. This instruction set is implemented on the Data-locality Management Unit (DMU) as a subset of the Computational SRAM (C-SRAM) Instruction Set Architecture. A programming model to interact with the DMU at languagelevel is also presented in this paper. This DMU ISA is evaluated on six applications run on three different system nodes. These system nodes are based on existing RISC-V cores and range from embedded to high-performance computing domain. Experiments show on average a speed-up of W8.81, an energy reduction factor of W6.81 and an improvement of the number of operations per cycle of W4.59, for The C-SRAM architecture integrating the proposed ISA of the DMU compared to a reference implementation on embedded systems. Results also show an improvement of the number of operations per cycle of W2.99 compared to a reference implementation on all system nodes.
Read full abstract