Abstract

The rise of Chip-Multiprocessors (CMPs) as a promising trend for the state of the art high-performance processors design raised the need for a scalable cache directory organization along with a simple cache coherence protocol as a hot research area. While thousands of cores are expected to fit on a single chip soon, the previously proposed cache directory schemes still lacks the scalability to accommodate more than tens of cores. The inefficiencies of these directory schemes come in the form of unaffordable memory overhead, excessive coherence traffic leading to performance degradation due to inexact representation of sharers and very complex coherence protocols. In this paper we introduce a new cache directory scheme for many core CMPs. The proposed scheme acquires, and actually improves, the scalability and low coherence traffic of cache-based linked list directory schemes while avoiding its completely sequential operation by exploiting the parallel operation of limited pointer directory schemes. We compare the proposed organization with these two previously proposed ones on different CMP configurations starting with a 4-core CMP and ending with a 32-core CMP. We show that the proposed scheme can avoid one third of the excessive broadcasted invalidation messages and two thirds of the extraneous acks in case of directory pointer overflows in limited pointer schemes. On the other hand, the proposed scheme achieves around 10% better performance than that of the completely sequential cache-based linked list directory while reducing the number of invalidation messages per invalidation event by 24%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call