Smart Memory and Network-On-Chip Design for High-Performance Shared-Memory Chip Multiprocessors

Mario Lodde

doi:10.4995/thesis/10251/35325

Abstract

The cache hierarchy and the Network-on-Chip (NoC) are two key components of chip multiprocessors (CMPs). Most of NoC tra?c is due to messages exchanged by the caches according to the coherence protocol. The amount of tra?c, the percentage of short and long messages and the tra?c pattern in general depend on the cache geometry and the coherence protocol. NoC architecture and the cache hierarchy are indeed tightly coupled, and these two components should be designed and evaluated together to study how varying one?s design a?ects the other one?s performance. Furthermore, each component should adjust to match the requirements and exploit the performance of the other one, and vice versa. Usually, messages belonging to di?erent classes are sent through di?erent virtual networks or through NoCs with di?erent bandwidth, thus separating short and long messages. However, other classi?cation of the messages can be done, depending on the type of information they provide: some messages, like data requests, need ?elds to store information (block address, type of request, etc.); other messages, like acknowledgement messages (ACKs), do not need to specify any information except for the destination node. This second class of messages do no require high NoC bandwidth: latency is far more important, since the destination node is typically blocked waiting for their reception. In this thesis we propose a dedicated network which is able to transmit this second class of messages; the dedicated network is lightweight and fast, and is able to deliver ACKs in a few clock cycles. By reducing ACKs latency and the NoC tra?c, it is possible to: ? speed-up the invalidation phase during write requests in a system which employs a directory-based coherence protocol. ? improve the performance of a broadcast-based coherence protocol, reaching performance which is comparable to that of a directory-based protocol but without the additional area overhead due to the directory. ? implement an e?cient and dynamic mapping of cache blocks to the last-level cache banks, aiming to map blocks as close as possible to the cores which use them. The ?nal goal is to obtain a co-design of the NoC and the cache hierarchy which minimizes the scalability problems due to coherence protocols. In this thesis we explore the di?erent design alternatives for fast network delivery and coherence protocol opportunities. The best mechanisms, combined on a ?nal system, allow for a truly dynamic and customizable architecture in an environment with multiple applications demanding partitioning of resources.

Full Text