A dedicated private‐shared cache design for scalable multiprocessors

Juan M Cebrián,Ricardo Fernández‐Pascual,Alberto Ros,Alexandra Jimborean,Manuel E Acacio

doi:10.1002/cpe.3871

Abstract

SummaryMost modern architectures are based on a shared‐memory design. Correctness of these architectures is ensured by means of coherence protocols and consistency models. However, performance and scalability of shared‐memory systems is usually constrained by the amount and size of the messages used to keep the memory subsystem coherent. This is not only important in high performance computing, but also in low power embedded systems, specially if coherence is required between different components of the system‐on‐chip. We argue that using the same mechanism to keep coherence for all memory accesses can be counterproductive, because it incurs unnecessary overhead for data addresses that would remain coherent after the access (i.e., private data and read‐only shared data). This paper proposes the use of dedicated caches for two different kinds of data (i) data that can be accessed without contacting other nodes and (ii) modifiable shared data. The private cache (L1P) will be independent for each core and will store private data and read‐only shared data. On the other hand, the shared cache (L1S), will be logically shared but physically distributed for all cores. With this design, we can significantly simplify the coherence protocol, reduce the on‐chip area requirements and reduce invalidation time. However, this dedicated cache design requires a classification mechanism to detect the nature of the data that is being accessed. Results show two drawbacks to this approach: first, the accuracy of the classification mechanism has a huge impact on performance. Second, a traditional interconnection network is not optimal for accessing the L1S, increasing register‐to‐cache latency when accessing shared data. Copyright © 2016 John Wiley & Sons, Ltd.

Full Text