This paper explores the need for asynchronous iteration algorithms as smoothers in multigrid methods. The hardware target for the new algorithms is top-of-the-line, highly parallel hybrid architectures – multicore-based systems enhanced with GPGPUs. These architectures are the most likely candidates for future high-end supercomputers. To pave the road for their effcient use, we must resolve challenges related to the fact that data movement, not floatingpoint operations, is the bottleneck to performance. Our work is in this direction — we designed block-asynchronous multigrid smoothers that perform more flops in order to reduce synchronization, and hence data movement. We show that the extra flops are done for “free,” while synchronization is reduced and the convergence properties of multigrid with classical smoothers like Gauss-Seidel can be preserved.
Read full abstract