Abstract

To improve the performance of SRAM in caches under near-threshold voltages, several timing speculation techniques, such as the cross-sensing SRAM (CS-SRAM), are proposed. Meanwhile, for a given process, voltage, and temperature (PVT) condition, CS-SRAM has an optimal bitline discharging time (TBL) to achieve the lowest average access latency. However, existing timing speculation caches do not track the variations of different PVT conditions to adjust the access timing to the optimal TBL point, on which the system possesses the lowest average memory access time. In this article, we propose a design of CS-SRAM-based L1 caches with a PVT autotracking mechanism, namely TS-PULP, which adjusts both the TBL and the frequency of the system clock to the optimal points. To quantify the improvement of our approach, a cycle-accurate RTL model of CS-SRAM and a field-programmable gate array (FPGA) prototype of the proposed L1 caches with the open-source system on chip (SoC) platform PULP have also been implemented. According to the evaluation results from RTL simulations and the FPGA prototype, the proposed caches can achieve a similar performance (over 80%) of the original standard cell memory (SCM)-based PULP design under TSMC 28 nm 0.5 V and 25 °C with only about 30% chip area. In addition, we introduced the figure of merit (FOM) of million instructions per second (MIPS), area, and energy (MAE) to comprehensively evaluate different approaches. The proposed scheme, TS-PULP, achieves the best FOM of MAE among four architectures (PULP with SCM, PULP with SRAM, TS-cache, and TS-PULP) with different cache sizes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call