Abstract

We describe a novel model for executing distributed memory parallel programs using uncoordinated tasks.We describe several off-line optimizations for the proposed model.We examine the effects of these optimizations on modern processors with wider vector units.Increasing levels of task coalescence can improve throughput and increase performance.Increases in performance are observed in both single node and multi node experiments. We propose a distributed dataflow execution model which utilizes a distributed dictionary for data memoization, allowing each parallel task to schedule instructions without direct inter-task coordination. We provide a description of the proposed model, including autonomous dataflow task selection. We also describe a set of optimization strategies which improve overall throughput of stencil programs executed using this model on modern multi-core and vectorized architectures.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.