Brief Announcement

Yuan Tang,Shiyi Wang

doi:10.1145/3087556.3087593

Abstract

It's important to hit a space-time balance for a real-world algorithm to achieve high performance on modern shared-memory multi-core and many-core systems. However, a large class of programs with more than O(1) dependency achieved optimality either in space or time, but not both. In the literature, the problem is known as the fundamental space-time tradeoff. We propose the notion of Processor-Adaptiveness. In contrast to the prior Processor-Awareness, our approach does not partition statically the problem space to the processor grid, but uses the processor count P to just upper bound the space and cache requirement in a cache-oblivious fashion. In the meantime, our processor-adaptive algorithms enjoy the full benefits of dynamic load-balance, which is a key to achieving satisfactory speedup on a shared-memory system, especially when the problem dimension n is reasonably larger than P. By utilizing the busy-leaves property of runtime scheduler and a program managed memory pool that combines the advantages of stack and heap, we show that our STAR (Space-Time Adaptive and Reductive) technique can help these programs to achieve sublinear time bounds while keeping to be asymptotically work-, space-, and cache-optimal. The key achievement of this paper is to obtain the first sublinear O(n3/4 log n) time and optimal O(n3) work GAP algorithm; If we further bound the space and cache requirement of the algorithm to be asymptotically optimal, there will be a factor of P increase in time bound without sacrificing the work bound. If P = o(n1/4 / log n), the time bound stays sublinear and may be a better tradeoff between time and space requirements in practice.

Full Text