Abstract

Non-volatile nanoscale memory devices (such as memristors) have promised to overcome the challenges of scalability and leakage currents of CMOS based memory devices. These novel memories can be fabricated in back-end-of-the-line of any CMOS process. Currently, a lot of research is focused on investigating the benefits of memristors for associative memories. These are Content-Addressable Memories (CAM) in which search based data access takes place. Searching for a particular bit in memristor is time consuming while search in CMOS CAM zone is efficient. To combine the speed and ease of search of CMOS memory and the scalability of memristor memory, we present a novel multibit hybrid CMOS-Memristor Associative Memory Cell. The benefits of such memory cells manifest in on-chip caches - the instruction and data cache, Branch Target Buffer, and Translation Lookaside Buffer. To exemplify the benefit of the cell further, we also simulate the MemCAM as the TLB of an ARM processor and obtained upto 50% decrease in miss rates of Data TLB and upto 93% in that of Instruction TLB. Average speedup of 1.16 was also achieved on various benchmark applications of PARCSEC and MiBench suites.

Highlights

  • The problems of static power consumption and leakage current in nanometer CMOS memories have limited their scalability on processor chips

  • We propose the idea of using a conventional Content-Addressable Memories (CAM) cell for write, read and search operations but integrating it with a memristor crossbar on top for expanding capacity

  • VOLUME 9, 2021 of resistive memory in on-chip caches and ensure that the switch to resistive CAM from CMOS CAM is smooth and seamless. We have demonstrated this functionality of MemCAM in the task-switch scenario by simulating an ARM processor with a MemCAM Translation Lookaside Buffer (TLB)

Read more

Summary

INTRODUCTION

The problems of static power consumption and leakage current in nanometer CMOS memories have limited their scalability on processor chips. The working was demonstrated only on a 2 ∗ 2 crossbar structure and it was found that matching process takes 12 ns which corresponds to 36 cycles on a 3 GHz processor for a single bit word This delay is much higher than today’s state-of-the-art CMOS CAMs [16]. A processor has many on-chip binary CAMs deployed like instruction and data caches, Translation Lookaside Buffer (TLB) and Branch Target Buffer (BTB) all of which are kept small due to the power dissipation and large layout footprint size of CAM cells The contents of these memories are specific to the currently executing task and are adaptively learned during this task’s execution.

BACKGROUND
SIMULATION SETUP AND CIRCUIT EVALUATION
OPERATION OF THE MEMRISTOR CROSSPOINT
USE CASE
TEST CASE
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call