Redundant linked list based cache coherence protocol

Qiang Li Qiang Li,S Vlaovic

doi:10.1109/ftpds.1994.494473

Abstract

This article presents a distributed directory based cache coherence protocol that improves performance and facilitates error recovery in large scale multiprocessors. A number of distributed directory based protocols, such as the Scalable Coherent Interface (SCI, ANSI/IEEE Std 1596), use a linked list structure to maintain cache coherence. While they work well for small to medium size systems, the list traversal overhead becomes high when the system size grows into the thousands of processors range. Also, the system is vulnerable to a single node failure in that the recovery from such a failure involves all the processors in the system. Single node failure can happen relatively frequently when a protocol is applied to SCI-based Local Area MultiProcessors (LAMP) where individual nodes are autonomous computers and can power up and down individually. We propose an enhancement to the linked list approach. A redundant spanning list is constructed when the list is built, which achieves two goals: 1) the list traversal time is reduced from O(N) to O(/spl radic/N) and 2) recovery from single node failure is confined to the processors involved in the failed list, unless the head of the list is lost.

Full Text