Chapter 3 - The Cache Design Problem and Its Solution

Steven A Przybylski

doi:10.1016/b978-0-08-050059-1.50008-6

Abstract

This chapter discusses cache design problem and presents its solution. Three things are needed to investigate experimentally the tradeoffs in memory hierarchy design: a trace-driven simulator, a set of points in the design space to be simulated, and a set of traces used to stimulate those memory hierarchies. The primary goals in the design of the memory hierarchy simulator were accuracy, flexibility, and efficiency. Most cache simulators only report miss rates and other nontemporal statistics. In this case though, there were additional requirements to accumulate accurate cycle counts and to precisely attribute execution time to specific components and operations within the simulated system. Flexibility refers to the ability of a single simulator and associated analysis programs to emulate a wide variety of target systems. Again, most cache simulators need only change a cache's size, associativity, and block size, but this simulator needed to model an arbitrarily deep memory hierarchy, with variation of all the temporal parameters throughout the system. Only two base scenarios are used in the simulation experiments that follow: one with a single split I/D cache and a second with a two-level hierarchy. The first base model has a Harvard organization. The split I and D virtual caches are 64 kilobytes (KB) each, organized as 4K blocks of four words (W), direct-mapped. Entire blocks are fetched on a miss. The data cache is write-back with no fetch done on a write miss. All read hits take one CPU cycle, while write hits take two—one to access the tags followed by one to write the data.

Full Text