Optimal Cache-Oblivious Mesh Layouts

Michael A Bender,Bradley C Kuszmaul,Kebin Wang,Shang-Hua Teng

doi:10.1007/s00224-009-9242-2

Abstract

A mesh is a graph that divides physical space into regularly-shaped regions. Meshes computations form the basis of many applications, including finite-element methods, image rendering, collision detection, and N-body simulations. In one important mesh primitive, called a mesh update, each mesh vertex stores a value and repeatedly updates this value based on the values stored in all neighboring vertices. The performance of a mesh update depends on the layout of the mesh in memory. Informally, if the mesh layout has good data locality (most edges connect a pair of nodes that are stored near each other in memory), then a mesh update runs quickly.This paper shows how to find a memory layout that guarantees that the mesh update has asymptotically optimal memory performance for any set of memory parameters. Specifically, the cost of the mesh update is roughly the cost of a sequential memory scan. Such a memory layout is called cache-oblivious. Formally, for a d-dimensional mesh G, block size B, and cache size M (where M=Ω(B d)), the mesh update of G uses O(1+|G|/B) memory transfers. The paper also shows how the mesh-update performance degrades for smaller caches, where M=o(B d).The paper then gives two algorithms for finding cache-oblivious mesh layouts. The first layout algorithm runs in time O(|G|log 2|G|) both in expectation and with high probability on a RAM. It uses O(1+|G|log 2(|G|/M)/B) memory transfers in expectation and O(1+(|G|/B)(log 2(|G|/M)+log |G|)) memory transfers with high probability in the cache-oblivious and disk-access machine (DAM) models. The layout is obtained by finding a fully balanced decomposition tree of G and then performing an in-order traversal of the leaves of the tree.The second algorithm computes a cache-oblivious layout on a RAM in time O(|G|log |G|log log |G|) both in expectation and with high probability. In the DAM and cache-oblivious models, the second layout algorithm uses O(1+(|G|/B) log (|G|/M)min {log log |G|,log (|G|/M)}) memory transfers in expectation and O(1+(|G|/B)(log (|G|/M)min {log log |G|,log (|G|/M)}+log |G|)) memory transfers with high probability. The algorithm is based on a new type of decomposition tree, here called a relax-balanced decomposition tree. Again, the layout is obtained by performing an in-order traversal of the leaves of the decomposition tree.

Full Text