HitFlow: A Dataflow Programming Model for Hybrid Distributed- and Shared-Memory Systems

Javier Fresno,Daniel Barba,Diego R Llanos,Arturo Gonzalez-Escribano

doi:10.1007/s10766-018-0561-2

Javier Fresno, Daniel Barba + Show 2 more

Open Access

https://doi.org/10.1007/s10766-018-0561-2

Copy DOI

Abstract

Dataflow programming consists in developing a program by describing its sequential stages and the interactions between them. The runtime systems supporting this kind of programming are responsible for exploiting the parallelism by concurrently executing the different stages as soon as their dependencies are met. In this paper we introduce a new parallel programming model and framework based on the dataflow paradigm. It presents a new combination of features that allows to easily map programs to shared or distributed memory, exploiting data locality and affinity to obtain the same performance than optimized coarse-grain MPI programs. These features include: It is a unique one-tier model that supports hybrid shared- and distributed-memory systems with the same abstractions; it can express activities arbitrarily linked, including non-nested cycles; it uses internally a distributed work-stealing mechanism to allow Multiple-Producer/Multiple-Consumer configurations; and it has a runtime mechanism for the reconfiguration of the dependences and communication channels which also allows the creation of task-to-task data affinities. We present an evaluation using examples of different classes of applications. Experimental results show that programs generated using this framework deliver good performance in hybrid distributed- and shared-memory environments, with a similar development effort as other dataflow programming models oriented to shared-memory.

Highlights

The most common programming tools for parallel machines are based on message passing libraries, such as Message Passing Interface (MPI) [1], or shared memory APIs like OpenMP [2]
In this work we propose a novel combination of features for dataflow programming models: (a) A single one-tier representation for shared- and distributed-memory architectures; (b) Description of a program as a reconfigurable network of activities and typed data containers arbitrarily interconnected, with a generic system to represent distributed Multiple-Producer/Multiple-Consumer (MPMC) configurations; (c) Support for dependence structures that involve non-nested feedback loops; (d) A mechanisms to reconfigure dependences at runtime without creating new tasks; and (e) A mechanism to intuitively express task-to-task affinities which would allow a better exploitation of data locality across state-driven activities
The messages of one transition could consume all the buffer memory, preventing the other transition from performing its communications. This opens the possibility of producing a deadlock on the progression of the whole network. This problem can be solved using new features that are proposed for MPI-4, such as Allocate Receive communications [10], that allocate memory internally for incoming messages to eliminate buffering overhead when receiving unknown-size messages, and Communication Endpoints [11] that allow the threads inside a process to communicate as if they were at separate ranks

Summary

Introduction

The most common programming tools for parallel machines are based on message passing libraries, such as MPI [1], or shared memory APIs like OpenMP [2]. Experimental work has been carried out to prove that the programs generated using our framework achieve good performance in comparison with manually developed implementations using Both message-passing libraries such as MPI, and state-of-the-art tools for parallel dataflow programming, like FastFlow [3] or CnC [4]. Tasks implemented as functions of different modes in the same transition are mutually exclusive and are executed by the same thread so they can share data structures. The second phase is a backtracking search that starts from the bottom-right element, and each task works on a part of the the matrix obtained in the first phase As it is shown, it is possible to create a network to model this kind of problems without using the modes. A table with the API methods can be found in [9]

Building transitions

Building the network

Mapping

Targeting both shared and distributed systems

Distributed places

Work-stealing

Benchmarks

Performance study

Mandelbrot set

Smith-Waterman

Code complexity

Related work

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Parallel Programming	Publication Date: Feb 15, 2018
Citations: 3	License type: cc-by

R Discovery Prime

R Discovery Prime

HitFlow: A Dataflow Programming Model for Hybrid Distributed- and Shared-Memory Systems

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Parallel Programming

Lead the way for us

Similar Papers

A Minimalistic Dataflow Programming Library for Python
Tiago A.O Alves ... Felipe M.G Franca
-
Tiago A.O Alves, et. al.Tiago A.O Alves ... Felipe M.G Franca
01 Oct 2014
01 Oct 2014

Programming models and applications for multicores and manycores
Pavan Balaji ... Zhiyi Huang
Concurrency and computation : practice & experience | VOL. 28
Pavan Balaji, et. al.Pavan Balaji ... Zhiyi Huang
26 Nov 2015
Concurrency and computation : practice & experience | VOL. 28

Advanced Dataflow Programming using Actor Machines for High-Level Synthesis
Endri Bezati ... James Larus
-
Endri Bezati, et. al.Endri Bezati ... James Larus
23 Feb 2020
23 Feb 2020

Graph Templates for Dataflow Programming
Alexandre C Sena ... Leandro A.J Marzulo
-
Alexandre C Sena, et. al.Alexandre C Sena ... Leandro A.J Marzulo
01 Oct 2015
01 Oct 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

HitFlow: A Dataflow Programming Model for Hybrid Distributed- and Shared-Memory Systems

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Parallel Programming