In-package DRAM cache provides a higher bandwidth than conventional memory systems. Adapting the cache management to the run-time characteristics of each application seems a promising approach improving bandwidth efficiency and performance. Regrettably, fine-grained cache block monitoring and adaptation often becomes impractical due to its significant bandwidth, performance and hardware overheads. This paper proposes a novel mechanism for monitoring cache blocks using two parameters that are adjustable at run time. We propose two low-cost counter-based mechanisms to realize the block monitors in DRAM. Moreover, we propose a novel scheduling mechanism that opportunistically transfers the counter information to the DRAM stack when the data movement overhead reaches its minimum. Our simulation results on a set of data intensive parallel applications indicate that the proposed mechanisms achieve averages of 31%, 24% performance improvements over the state-of-the-art DRAM cache architectures. System energy savings over the same baselines are 29%, 18% on average.
Read full abstract