A task-uncoordinated distributed dataflow model for scalable high performance parallel program execution

Lucas A Wilson,Jeffery Von Ronne

doi:10.1016/j.parco.2015.10.013

Abstract

We describe a novel model for executing distributed memory parallel programs using uncoordinated tasks.We describe several off-line optimizations for the proposed model.We examine the effects of these optimizations on modern processors with wider vector units.Increasing levels of task coalescence can improve throughput and increase performance.Increases in performance are observed in both single node and multi node experiments. We propose a distributed dataflow execution model which utilizes a distributed dictionary for data memoization, allowing each parallel task to schedule instructions without direct inter-task coordination. We provide a description of the proposed model, including autonomous dataflow task selection. We also describe a set of optimization strategies which improve overall throughput of stencil programs executed using this model on modern multi-core and vectorized architectures.

Full Text