Large object space support for software distributed shared memory

Wang Leung Benny Cheung

doi:10.5353/th_b3160174

Abstract

Software Distributed Shared Memory (DSM) allows shared memory programming on machines with physically distributed memory. By introducing a virtual shared memory abstraction layer, DSM programs have close resemblance to sequential ones, exempting the need from inserting send and receive statements in the application code. Existing DSM systems can only support a shared memory space smaller than the hardware addressable virtual memory (VM) for a single process. This hinders scientific applications with large demand for shared memory from executing under DSM. Such problem has not so far been satisfactorily addressed in previous DSM studies. This study proposes a generic solution for supporting a shared object space larger than the VM addressable by the underlying hardware for a single process, and is implemented in the LOTS (Large ObjecT Space) DSM system. The challenge is to address more than 2n bytes of shared objects by an n-bit processor, while conforming to the process space utilization imposed by the operating system. In LOTS, shared objects to be accessed are swapped into VM from either the local hard disks or remote machines, while unused ones are swapped out to save process space. Unlike previous solutions in persistent storage systems, which required special compilers to generate and trap addresses for large objects, our solution is triggered automatically by user-level library code through the C++ operator overloading facility. No explicit program code, special compiler or operating system support is needed, preserving LOTS' good portability and programmability. This solution is also hardware-independent, making it generic and applicable to 32, 64 or 128-bit machines in extending the shared object space. In a large object space DSM, large objects can lead to I/O performance bottleneck and external fragmentation in VM. Thus we introduce segments, small but variable-sized chunks split from large objects to solve the problems. Contrary to some DSM systems, which split objects mechanically into fixed-size portions, LOTS splits objects into segments intelligently to increase the chance for each segment to fit the shared memory access patterns. This study also devises a new coherence protocol for large shared object space. Contrasting to traditional DSM, the local VM in LOTS cannot store all segments in the working set. We have to choose suitable segments to be swapped out to the disks. The decision can dramatically affect the number of disk accesses, but an exact prediction of the future memory access pattern is impossible. Our solution features a priority-based swapping protocol. It determines the swapping order of segments according to their access states, thereby reducing the amount of swaps between VM and the disks. A migrating-home-based mixed coherence protocol is also developed, treating lock and barrier-synchronized segments differently to minimize the number of network broadcast messages and bytes sent.

Full Text