NUDA: Non-Uniform Directory Architecture for Scalable Chip Multiprocessors

Wei Shu,Nian-Feng Tzeng

doi:10.1109/tc.2017.2773061

Wei Shu, Nian-Feng Tzeng

Open Access

https://doi.org/10.1109/tc.2017.2773061

Copy DOI

Journal: IEEE Transactions on Computers	Publication Date: May 1, 2018
Citations: 22	License type: publisher-specific, author manuscript

Affiliation: University of Louisiana at Lafayette

Abstract

Chip multiprocessors (CMPs) involve directory storage overhead if cache coherence is realized via sharer tracking. This work proposes a novel framework dubbed n on- u niform d irectory a rchitecture (NUDA), by leveraging our two insights in that the number of “active” directory entries required to stay on chip is usually small for a short execution time window due to high directory locality, and that the fraction of interrogated directory entries drops as the core count rises. Unlike earlier storage overhead reduction techniques that require all cached LLC blocks to have their directory entries fully on chip, NUDA dynamically buffers only most active d irectory v ectors (DVs) on chip while keeping DVs of all LLC blocks in a backing store at low level storage. NUDA attains its superior efficiency via an inventive c riticality- a ware r eplacement p olicy (CARP) for on-chip buffer management and effective prefetching to p re- a ctivate ve ctors (PAVE) for upcoming coherence interrogations. We have evaluated NUDA by gem5 simulation for 64-core CMPs under PARSEC and SPLASH benchmarks, demonstrating that CARP and PAVE enhance on-chip directory storage efficiency significantly. NUDA with a small on-chip buffer for DVs exhibits negligible performance degradation (to stay within 2.6 percent) compared to a full on-chip directory, while outperforming its previous counterparts for directory area reduction when on-chip directory budget is provisioned scarcely for high scalability.

Full Text