Integrated Coherence Prediction

Libo Huang,Yongwen Wang,Nong Xiao,Zhiying Wang,Qiang Dou

doi:10.1145/2611756

Abstract

Multicore architectures with Network-on-Chips (NoCs) have been widely recognized as the de facto design for the efficient utilization of the continuously increasing density of transistors on a chip. A key challenge in designing such an NoC-based multicore processor is maintaining cache coherence in an efficient manner. Directory-based protocols avoid the bandwidth overhead of snoop-based protocols, therefore scaling to a large number of cores. However, conventional directory structures add significant indirection delay to cache-to-cache accesses in larger multicore processor. In this article we propose a novel hardware coherence technique, called integrated coherence prediction (ICP). This approach adopts a prediction technique for managing shared data to reduce or eliminate the cache-to-cache delay in coherence accesses. ICP has two unique features that differ from previous coherence prediction techniques. First, ICP introduces a new integrated prediction scheme that combines two kinds of predictors: owner predictor, which predicts the data writers and avoids the indirection through directory, and data predictor, which predicts the access address and prefetches data from remote nodes directly. Second, ICP uses a request replication method to reduce the negative effect of wrong owner prediction operations, thus facilitating overall performance improvement. We present the design and implementation details of the ICP approach. Using detailed full-system simulations, we conclude that the ICP provides a cost-effective solution for designing high-performance multicore processors.

Full Text