Reducing Coherence Overhead Research Articles

Recent proposals are based on classifying memory accesses into private or shared in order to process private accesses more efficiently and reduce coherence overhead. The classification mechanisms previously proposed are either not able to adapt to the dynamic sharing behavior of the applications or require frequent broadcast messages. Additionally, most of these classification approaches assume single-level translation lookaside buffers (TLBs). However, deeper and more efficient TLB hierarchies, such as the ones implemented in current commodity processors, have not been appropriately explored. This paper analyzes accurate classification mechanisms in multilevel TLB hierarchies. In particular, we propose an efficient data classification strategy for systems with distributed shared last-level TLBs. Our approach classifies data accounting for temporal private accesses and constrains TLB-related traffic by issuing unicast messages on first-level TLB misses. When our classification is employed to deactivate coherence for private data in directory-based protocols, it improves the directory efficiency and, consequently, reduces coherence traffic to merely 53.0 percent, on average. Additionally, it avoids some of the overheads of previous classification approaches for purely private TLBs, improving average execution time by nearly 9 percent for large-scale systems.

Read full abstract

In this work, we characterized the memory performance—and in particular the impact of coherence overhead and process migration—of a shared-bus shared-memory multiprocessor running a DSS workload. When the number of processors is increased in order to achieve higher computational power, the bus becomes a major bottleneck of such architecture. We evaluated solutions that can greatly reduce that bottleneck. An area where this kind of optimization is important regards data base systems. For this reason, we considered a DSS workload and we setup the experiments following TPC-D specifications on the PostgreSQL DBMS in order to explore different optimizations on same kind of workloads as evaluated in the literature. In this scenario, we compare possible solutions to boost performance and we show the impact of process migration on coherence overhead. We found that the consequences of coherence overhead and process migration on performance are very important in machines with 16 or more processors. In this case, even little sharing, as in DSS applications, can become crucial for system performance. Another important result of our analysis regards the interaction between the coherence protocol and the scheduler. The basic cache affinity scheduling is useful in reducing migration, but it is not effective in every load condition. Specific coherence protocols can help reduce the effects of process migration, especially in situations when the scheduler cannot apply the affinity requirement. In these conditions, the use of a write-update protocol with a selective invalidation strategy for private data improves performance (and scalability) of about 20% with respect to a classical MESI-based solution. This advantage is about 50% in the case of high cache-to-cache transfer.

Read full abstract

Reducing Coherence Overhead Research Articles

Articles published on Reducing Coherence Overhead

TLB-Based Temporality-Aware Classification in CMPs with Multilevel TLBs

Efficient TLB-Based Detection of Private Pages in Chip Multiprocessors

Reducing coherence overhead and boosting performance of high-end SMP multiprocessors running a DSS workload

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Reducing Coherence Overhead Research Articles

Articles published on Reducing Coherence Overhead

TLB-Based Temporality-Aware Classification in CMPs with Multilevel TLBs

Efficient TLB-Based Detection of Private Pages in Chip Multiprocessors

Reducing coherence overhead and boosting performance of high-end SMP multiprocessors running a DSS workload