Abstract

Efficient and scalable cache coherence protocol is crucial to high-performance servers with shared-memory. The directory-based cache coherence protocol is more desirable than the snooping-based protocol with respect to the scalability. However, even for the former protocol, scaling to a large number of cores is also challenging due to the additional area requirements of the directories. We observed that a significant percentage of the referenced memory blocks were only accessed by a single core (even in parallel applications) which could be considered as private memory blocks. An intuitive motivation from this observation is that memory blocks accessed by a single core do not require coherence maintenance. The issue is to identify the private block and track the change of its access pattern. We propose a novel hardware approach to (1) dynamically identify the shared memory blocks at the cache block level, and (2) bypass the coherence procedure for the private memory blocks. This approach increases the effectiveness of the directory-based approach and therefore improves the system performance. Experimental results showed that, our approach can on an average (1) avoid the coherence tracking of about 54% referenced memory blocks, (2) reduce the coherence overhead by 77%, (3) avoid 8% L2 cache misses, and (4) shorten the execution time of parallel applications by 13%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call