Abstract

In high performance computing (HPC) applications, the speed of the L1 cache will typically determine the maximum frequency (/Max) of the processor core. Companies that mass produce high-performance microprocessors commonly have the L1 cache consist of fully-custom macros: to ensure that the performance of the L1 cache does not limit the f MAX or throughput of the processor. In addition, it is also common for the custom L1 cache designs to use a two-port 8T or a large 6T bitcell, along with domino read logic and very short BL [2,3]. These designs tradeoff density and area for high performance. This paper presents a different approach, one which can satisfy a range of different applications; a memory compiler that can generate more than 10,000 different high-speed L1 cache macro configurations is proposed. The 7nm L1-cache compiler described in this paper uses a high-current (HC) 6T bitcell, which is more area efficient than an 8T bitcell. The HC bitcell, along with small-signal sensing, allows for long BL (256b), leading to further area efficiency improvements. Since these L1 macros are just as likely to be used in mobile applications as they are to be used in HPC applications, they were implemented using the array dual-rail (ADR) architecture [4]. The ADR architecture (Fig. 11.3.1) allows the periphery circuits of the L1 macro to operate at the same voltage as the processor core: a lower l/ DD results in dynamic power savings. ADR performance is also improved, over an interface dual-rail, when the SRAM and logic supplies are equivalent, as ADR design does not suffer from a level-shifter delays on the inputs or outputs.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call