Arbitration Latency Research Articles

This article addresses the challenge of allowing simultaneous and predictable accesses to shared data on multi-core systems. We propose a collection of predictable cache coherence protocols, which mandate the use of certain design invariants to ensure predictability. In particular, we enforce these invariants by augmenting the classic modify-share-invalid (MSI) protocol and modify-exclusive-share-invalid (MESI) protocol. This allows us to derive worst-case latency bounds on the resulting predictable MSI (PMSI) and predictable MESI (PMESI) protocols. Our analysis shows that while the arbitration latency scales linearly, the coherence latency scales quadratically with the number of cores, which emphasizes the importance of accounting for cache coherence effects on latency bounds. We implement PMSI and PMESI in a detailed micro-architectural simulator, and execute SPLASH-2 and synthetic workloads. Results show that our approach is always within the analytical worst-case latency bounds, and that PMSI and PMESI improve average-case performance by up to 4× over cache bypassing mechanisms that disallow caching of shared data in the cores’ private caches. PMSI and PMESI have average slowdowns of 1.45× and 1.46× compared to conventional MSI and MESI protocols, respectively.

Read full abstract

SUMMARYOptical networks on chip based on silicon photonics have been proposed to reduce latency and power consumption in future chip multiprocessors. However, high performance chip multiprocessors use a shared memory model, which generates large numbers of short messages, creating high arbitration latency overhead for photonic switching networks. In this paper, we explore techniques that intelligently use information from the memory hierarchy to predict communication in order to setup photonic circuits with reduced or eliminated arbitration latency. Firstly, we present a switch scheduling algorithm, which arbitrates on a per memory transaction basis and holds open photonic circuits to exploit temporal locality. We show that this can reduce the average arbitration latency overhead by 60% and eliminate arbitration latency altogether for up to 70% of memory transactions. We then demonstrate that this switch scheduling algorithm operating with a central photonic crossbar or Clos switch has significant energy efficiency benefits over arbitration‐free photonic networks such as single writer multiple reader networks. Finally, we demonstrate that cache miss prediction can be used to predict 86% of more complex memory transactions involving multiple nodes or main memory. Copyright © 2014 John Wiley & Sons, Ltd.

Read full abstract

Arbitration Latency Research Articles

Related Topics

Articles published on Arbitration Latency

Designing Predictable Cache Coherence Protocols for Multi-Core Real-Time Systems

Towards zero latency photonic switching in shared memory networks

Routing of asynchronous Clos networks

SAMBA-Bus: A High Performance Bus Architecture for System-on-Chips

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Arbitration Latency Research Articles

Related Topics

Articles published on Arbitration Latency

Designing Predictable Cache Coherence Protocols for Multi-Core Real-Time Systems

Towards zero latency photonic switching in shared memory networks

Routing of asynchronous Clos networks

SAMBA-Bus: A High Performance Bus Architecture for System-on-Chips