Distributed Shared Memory Research Articles

To support a global virtual memory space, an architecture must translate virtual addresses dynamically. In current processors, the translation is done in a TLB (translation lookaside buffer), before or in parallel with the first-level cache access. As processor technology improves at a rapid pace and the working sets of new applications grow insatiably, the latency and bandwidth demands on the TLB are difficult to meet, especially in multiprocessor systems, which run larger applications and are plagued by the TLB consistency problem. We describe and compare five options for virtual address translation in the context of distributed shared memory (DSM) multiprocessors, including CC-NUMAs (cache-coherent non-uniform memory access architectures) and COMAs (cache only memory access architectures). In CC-NUMAs, moving the TLB to shared memory is a bad idea because page placement, migration, and replication are all constrained by the virtual page address, which greatly affects processor node access locality. In the context of COMAs, the allocation of pages to processor nodes is not as critical because memory blocks can dynamically migrate and replicate freely among nodes. As the address translation is done deeper in the memory hierarchy, the frequency of translations drops because of the filtering effect. We also observe that the TLB is very effective when it is merged with the shared-memory, because of the sharing and prefetching effects and because there is no need to maintain TLB consistency. Even if the effectiveness of the TLB merged with the shared memory is very high, we also show that the TLB can be removed in a system with address translation done in memory because the frequency of translations is very low.

Read full abstract

Distributed shared memory (DSM) multiprocessors typically require disjoint networks for deadlock-free execution of cache coherence protocols. This is normally achieved by implementing virtual networks with the help of virtual channels or virtual lanes multiplexed on a single physical network. To keep the coherence protocol simple, messages are usually assigned to virtual lanes in a predefined static manner based on a cycle-free lane assignment dependence graph. However, this static split of virtual networks (such as request and reply networks) may lead to underutilization of certain virtual networks while saturating the other networks. In this paper, we explore different static and dynamic schemes to select the virtual lanes for outgoing messages and mix the load among them without restricting any particular type of message to be carried only by a particular virtual network. We achieve this by exposing the selection algorithms to the coherence protocol itself, so that it can inject messages into selected virtual lanes based on some local information, and still enjoy deadlock-freedom. Our execution-driven simulation on five applications from the SPLASH-2 suite shows that as the system scales, the virtual network selection algorithms play an important role. For 128-node systems, our dynamic selection algorithm speeds up parallel execution by as much as 22 percent over an optimized baseline system running a modified SGI Origin 2000 protocol. We also explore how network latency, the number of message buffers per virtual lane, and the depth of network interface output queues affect the relative performance of various virtual lane selection algorithms.

Read full abstract

Distributed Shared Memory Research Articles

Related Topics

Articles published on Distributed Shared Memory

Distributed Shared Memory for Roaming Large Volumes

A Transparent Distributed Shared Memory for Clustered Symmetric Multiprocessors

Selective optical broadcast component for reconfigurable multiprocessor interconnects

Optimal propagation-based protocols implementing causal memories

Truss: A Reliable, Scalable Server Architecture

Towards a more efficient implementation of OpenMP for clusters via translation to global arrays

A hierarchical distributed-shared memory parallel Branch&Bound application with PVM and OpenMP for multiprocessor clusters

In-kernel integration of operating system and infiniband functions for high performance computing clusters: a DSM example

Moving address translation closer to memory in distributed shared-memory multiprocessors

Simulation tools to study a distributed shared memory for clusters of symmetric multiprocessors

Memory sharing for interactive ray tracing on clusters

Software Distributed Shared Memory: a VIA-based implementation and comparison of sequential consistency with home-based lazy release consistency

A cluster-based secure active network environment

JIACKPT: A Recoverable Software Distributed Shared Memory System

Design of a High-Speed Optical Interconnect for Scalable Shared-Memory Multiprocessors

Novel Communication Strategy for Distributed-Shared Memory Parallel Computer

A common framework for inter-process communication on a cluster

RAPID: Reconfigurable and Scalable All-Photonic Interconnect for Distributed Shared Memory Multiprocessors

PIPORS: a parallel input parallel output register switching system

Exploring virtual network selection algorithms in DSM cache coherence protocols

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Distributed Shared Memory Research Articles

Related Topics

Articles published on Distributed Shared Memory

Distributed Shared Memory for Roaming Large Volumes

A Transparent Distributed Shared Memory for Clustered Symmetric Multiprocessors

Selective optical broadcast component for reconfigurable multiprocessor interconnects

Optimal propagation-based protocols implementing causal memories

Truss: A Reliable, Scalable Server Architecture

Towards a more efficient implementation of OpenMP for clusters via translation to global arrays

A hierarchical distributed-shared memory parallel Branch&Bound application with PVM and OpenMP for multiprocessor clusters

In-kernel integration of operating system and infiniband functions for high performance computing clusters: a DSM example

Moving address translation closer to memory in distributed shared-memory multiprocessors

Simulation tools to study a distributed shared memory for clusters of symmetric multiprocessors

Memory sharing for interactive ray tracing on clusters

Software Distributed Shared Memory: a VIA-based implementation and comparison of sequential consistency with home-based lazy release consistency

A cluster-based secure active network environment

JIACKPT: A Recoverable Software Distributed Shared Memory System

Design of a High-Speed Optical Interconnect for Scalable Shared-Memory Multiprocessors

Novel Communication Strategy for Distributed-Shared Memory Parallel Computer

A common framework for inter-process communication on a cluster

RAPID: Reconfigurable and Scalable All-Photonic Interconnect for Distributed Shared Memory Multiprocessors

PIPORS: a parallel input parallel output register switching system

Exploring virtual network selection algorithms in DSM cache coherence protocols