The cache complexity of multithreaded cache oblivious algorithms

Matteo Frigo,Volker Strumpen

doi:10.1145/1148109.1148157

Abstract

We present a technique for analyzing the number of cache misses incurred by multithreaded cache oblivious algorithms on an idealized parallel machine in which each processor has a private cache. We specialize this technique to computations executed by the Cilk work-stealing scheduler on a machine with dag-consistent shared memory. We show that a multithreaded cache oblivious matrix multiplication incurs O(n3/√Z + (Pn)1/3n2) cache misses when executed by the Cilk scheduler on a machine with P processors, each with a cache of size Z, with high probability. This bound is tighter than previously published bounds. We also present a new multithreaded cache oblivious algorithm for 1D stencil computations, which incurs O(n2/Z+n+√Pn3+e) cache misses with high probability.

Full Text