Cache memory systems consume a significant portion of static and dynamic power consumption in processors. Similarly, the access latency through the cache memory system significantly impacts the overall processor performance. Several techniques have been proposed to tackle the individual power or performance. However, almost all trade off performance for power or vice versa. We propose a novel scheme that improves performance while reducing both static and dynamic power with minimal area overhead. Our proposed scheme reduces dynamic power by using a hash-based mechanism to minimize the number of cache lines read during program execution. This is achieved by identifying and not reading those that are guaranteed non-matches (i.e., cache misses) to a new access. Performance improvement occurs when all cache lines of a referenced set are determined non-matches to the requested address, and therefore skip a few cache pipe stages as guaranteed misses. Static power savings is achieved by exploiting in-flight cache access information to deterministically lower the power state of cache lines that are guaranteed not to be accessed in the immediate future. These techniques easily integrate into existing cache architectures and were evaluated using widely known CAD tools and benchmarks. We have observed up to 92, 17, and 2 percent improvements in performance, static, and dynamic power, respectively, with less than 3 percent area overhead.
Read full abstract