Abstract
In-memory processing can dramatically improve the latency and energy consumption of computing systems by minimizing the data transfer between the memory and the processor. Efficient execution of processing operations within the memory is therefore, a highly motivated objective in modern computer architecture. This article presents a novel automatic framework for efficient implementation of arbitrary combinational logic functions within a memristive memory. Using tools from logic design, graph theory and compiler register allocation technology, we developed synthesis and in-memory mapping of logic execution in a single row (SIMPLER), a tool that optimizes the execution of in-memory logic operations in terms of throughput and area. Given a logical function, SIMPLER automatically generates a sequence of atomic memristor-aided logic (MAGIC) NOR operations and efficiently locates them within a single size-limited memory row, reusing cells to save area when needed. This approach fully exploits the parallelism offered by the MAGIC NOR gates. It allows multiple instances of the logic function to be performed concurrently, each compressed into a single row of the memory. This virtue makes SIMPLER an attractive candidate for designing in-memory single instruction, multiple data (SIMD) operations. Compared to the previous work (that optimizes latency rather than throughput for a single function), SIMPLER achieves an average throughput improvement of $435\times $ . When the previous tools are parallelized similarly to SIMPLER, SIMPLER achieves higher throughput of at least $5\times $ , with $23\times $ improvement in area and $20\times $ improvement in area efficiency. These improvements more than fully compensate for the increase (up to 17% on average) in latency.
Highlights
A BASIC assumption that has guided computer architects in the design of almost all modern computing systems is the separation between processing units and data storage units
We evaluate the SIMPLER synthesis and mapping tool by calculating the latency, throughput, area, and area efficiency of each benchmark execution using the Python-based tool we developed
This article presented an automatic logic synthesis flow called SIMPLER for optimizing the throughput of in-memory SIMD computations
Summary
A BASIC assumption that has guided computer architects in the design of almost all modern computing systems is the separation between processing units and data storage units. We work under the single instruction, multiple data (SIMD) [2] concept, exploiting parallelism among different computation instances rather than optimizing a single instance of a given logical function. The latency of a single computation instance may be slightly higher, the overall throughput of the array increases dramatically thanks to the ability to compute each instance in a different row in parallel Such a configuration allows SIMD operations to be supported efficiently in an mMPU setup for the first time. The magnitude of this paradigm change is illustrated by a simple, realistic example: a system consisting of a 512-row memory array can execute 512 computation instances in parallel. When the previous tools are parallelized in a similar manner to SIMPLER, SIMPLER offers at least 5× higher throughput and 23× smaller area usage
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.