Prefetching helps in reducing the memory access latency in multi-banked NUCA architecture, where the Last Level Cache (LLC) is shared. In such systems, an application running on core generates significant traffic on the shared resources, the underlying network and LLC. While prefetching helps to increase application performance, but an inaccurate prefetcher can cause harm by generating unwanted traffic that additionally increases network and LLC contention. Increased network contention results in untimely prefetching of cache blocks, thereby reducing the effectiveness of a prefetcher. Prefetch accuracy is extensively used to reduce unwanted prefetches that can mitigate the prefetcher caused contention. However, the conventional prefetch accuracy parameter has major limitations in NUCA architectures. The article exposes that prefetch accuracy can create two major false-positive cases of prefetching, Under-estimation and Over-estimation problems, and false feedback loop that can mislead a prefetcher in generating more unwanted traffic. We propose a novel technique, Coordinated Prefetching for Efficient (COPE), which addresses these issues by redefining prefetch accuracy for such architectures and identifies additional parameters that can avoid generating unwanted prefetch requests. Experiment conducted using PARSEC benchmark on a 64-core system shows that COPE achieve 3% reduction in L1 cache miss rate, 12.64% improvement in IPC, 23.2% reduction in average packet latency and 18.56% reduction in dynamic power consumption of the underlying network.
Read full abstract