In large-scale chip multiprocessors (CMPs), the scalability of a coherence directory becomes more important as the number of cores increases. However, previously proposed scalable coherence directories typically reduce the directory storage overhead at the cost of one or more aspects of performance, accuracy, and complexity. In this article, we propose the tag-sharer-fusion (TSF) directory, a scalable coherence directory with low hardware complexity, as well as with high performance and accuracy. Each directory entry has just enough bits to store a single sharer pointer and is divided into two primary formats: <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">tag</i> and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">sharer</i> , where <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">sharer</i> entries store sharers but not tags. Each private block is tracked by a <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">tag</i> entry, and each shared block is tracked by a combination of a <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">tag</i> entry and a <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">sharer</i> entry in the same set. Simulation of a 128-core chip-multiprocessor with the PARSEC and SPLASH-2x benchmarks shows that the TSF directory requires only a quarter of the area of a non-scalable full-map sparse directory to achieve similar performance and network traffic, both with an average overhead within 1%. The TSF directory outperforms the state-of-the-art Pool and way-combining directory proposals in terms of storage overhead, performance, and network traffic.
Read full abstract