Abstract

There is a growing need to perform large computations on small systems, as access to large systems is not widely available and cannot keep up with the scaling of data. BDMPI was recently introduced as a way of achieving this for applications written in MPI. BDMPI allows the efficient execution of standard MPI programs on systems whose aggregate amount of memory is smaller than that required by the computations and significantly outperforms other approaches. In this paper we present a virtual memory subsystem which we implemented as part of the BDMPI runtime. Our new virtual memory subsystem, which we call SBMA, bypasses the operating system virtual memory manager to take advantage of BDMPI's node-level cooperative multi-taking. Benchmarking using a synthetic application shows that for the use cases relevant to BDMPI, the overhead incurred by the BDMPI-SBMA system is amortized such that it performs as fast as explicit data movement by the application developer. Furthermore, we tested SBMA with three different classes of applications and our results show that with no modification to the original MPI program, speedups from 2×--12× over a standard BDMPI implementation can be achieved for the included applications.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call