Abstract

In this paper we introduce ICCI, a new cache organization that leverages shared cache resources and flat coherence protocols to provide inexpensive hardware cache coherence for large core counts (e.g., 512), achieving execution times close to a non-scalable sparse directory while noticeably reducing the energy consumption of the memory system. Very simple changes in the system with respect to traditional bit-vector directories are enough to implement ICCI. Moreover, ICCI does not introduce any storage overhead with respect to a broadcast-based protocol, yet it provides large storage space for coherence information. ICCI makes smarter use of cache resources by dynamically allowing last-level cache entries to store blocks or sharing codes. This way, just the minimum number of directory entries required at runtime are allocated. Besides, ICCI suffers a negligible amount of directory-induced invalidations. Results for a 512-core CMP show that ICCI reduces the energy consumption of the memory system by up to 48 percent compared to a tag-embedded directory, up to 15 percent compared to a sparse directory, and up to 8 percent compared to the state-of-the-art Scalable Coherence Directory which ICCI also outperforms in execution time. In addition, ICCI can be used in combination with elaborated sharing codes to apply it to extremely large core counts. We also show analytically that ICCI’s dynamic allocation of entries makes it a suitable candidate to store coherence information efficiently for very large core counts (e.g., over 200K cores), based on the observation that data sharing makes fewer directory entries necessary per core as core count increases.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call