As we enter the era of many-core, providing the shared memory abstraction through cache coherence has become progressively difficult. The standard directory-based coherence does not scale well with increasing core count. Timestamp-based hardware coherence protocols introduced recently offer an attractive alternative solution. This paper proposes a timestamp-based coherence protocol, called TC-Release ++ , that efficiently supports cache coherence in large-scale systems. Our approach is inspired by TC-Weak , a recently proposed timestamp-based coherence protocol targeting GPU architectures. We first design TC-Release in an attempt to straightforwardly port TC-Weak to general-purpose many-cores. But re-purposing TC-Weak for general-purpose many-core architectures is challenging due to significant differences both in architecture and the programming model. Indeed the performance of TC-Release turns out to be worse than conventional directory protocols. We overcome the limitations and overheads of TC-Release by exploiting simple hardware support to eliminate frequent memory stalls, and an optimized lifetime prediction mechanism to improve cache performance. The resulting optimized coherence protocol TC-Release ++ is highly scalable (storage scales logarithmically with core count) and shows better performance (3.0 percent) and comparable network traffic (within 1.3 percent) relative to the baseline MESI directory protocol. We use Murphi to formally verify that TC-Release ++ is error-free and imposes small verification cost.
Read full abstract