Abstract

The ever-increasing cost of data movement in computer systems is driving a new era of data-centric computing. common data-centric paradigms is near-data computing (NDC), where accelerators are placed <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">inside</i> the memory hierarchy to avoid the costly transfer of data to the core. potential to improve performance and energy efficiency. Unfortunately, adding accelerators into the memory hierarchy incurs significant complexity for system integration because accelerators often require cache-coherent access to memory. coherence protocols required to handle both cores and cache-attached accelerators result in significantly higher verification costs as well as an increase in directory state and on-chip network traffic. baseline processor performance. To simplify the integration of cache-attached accelerators, we present Kobold, a new coherence protocol and implementation which restricts the added complexity of an accelerator to its local tile. Kobold introduces a new directory structure within the L2 cache to track the accelerator's private cache and maintain coherence between the core and accelerator. A minor modification to the LLC protocol also enables accelerators to improve performance by bypassing the local L2. We verified Kobold's stable-state coherence protocols using the Murphi model checker and estimated area overhead using Cacti 7. Kobold simplifies integration of cache-attached accelerators, adds only 0.09% area over the baseline caches, and provides clear performance advantages vs. naïve extensions of existing directory coherence protocols.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call