Parallel Dataflow Research Articles

The use of modern multi-core and multi-processor computer systems for actual tasks is not effective enough, even taking into account the use of parallel programming technologies. The solution to the problem of efficient loading of computing resources can be the transition to computing models and architectures that are inherently parallel. One of such architectures is the Parallel Dataflow Computing System (PDCS) "Buran", which implements a dataflow computing model with a dynamically formed context. A feature of the dataflow computing model is the activation of computations by data readiness, which affects both the architecture of the computing system and the creation of programs for such systems. Differences between imperative and dataflow programming paradigms are also reflected in the route of creating a program, especially its parallel implementation. The route of creating a dataflow parallel program differs markedly from the traditional one. Already at the first stage, a parallel algorithm is created and implemented (including the algorithm for generating initial data). Next, this algorithm is debugged when it is executed on an emulator or model of the system with one computing core. After that, the selection and configuration of function of computation distribution is performed, the program is executed (without changing its code) on the emulator or model of the system with several computing cores, and, finally, the program is debugged in multi-core mode. These stages of the route differ from similar stages of the traditional route, both in form and in essence. Unlike traditional parallel, and even more so sequential programs, it can be said that a dataflow program in equal parts consists of a program code that implements a task, an algorithm for generating initial data, and a function of computation distribution. This thesis is demonstrated by the example of solving the problem of finding sum of array elements, where the difference between the implementations is in the algorithm for generating initial data, which radically affects the nature of the dataflow program passing. Careful attention to each of the parts of the dataflow program is the key to the correct and efficient solution of problems on the PDCS, which provides support to the programmer at the hardware level.

Read full abstract

Applications in data-parallel computing typically consist of multiple stages. In each stage, a set of intermediate parallel data flows ( <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Coflow</i> ) is produced and transferred between servers to enable starting of next stage. While there has been much research on scheduling isolated coflows, the dependency between coflows in multi-stage jobs has been largely ignored. In this paper, we consider scheduling coflows of multi-stage jobs represented by general <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">DAG</i> s (Directed Acyclic Graphs) in a shared data center network, so as to minimize the total weighted completion time of jobs. This problem is significantly more challenging than the traditional coflow scheduling, as scheduling even a single multi-stage job to minimize its completion time is shown to be NP-hard. In this paper, we propose a polynomial-time algorithm with approximation ratio of <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$O(\mu \log (m)/\log (\log (m)))$ </tex-math></inline-formula> , where <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\mu $ </tex-math></inline-formula> is the maximum number of coflows in a job and <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$m$ </tex-math></inline-formula> is the number of servers. For the special case that the jobs’ underlying dependency graphs are <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">rooted trees</i> , we modify the algorithm and improve its approximation ratio. To verify the performance of our algorithms, we present simulation results using real traffic traces that show up to 53% improvement over the prior approach. We conclude the paper by providing a result concerning an optimality gap for scheduling coflows with general DAGs.

Read full abstract

Parallel Dataflow Research Articles

Related Topics

Articles published on Parallel Dataflow

A certain examination on heterogeneous systolic array (HSA) design for deep learning accelerations with low power computations

Features of Creating Parallel Programs for the Parallel Dataflow Computing System "Buran"

Transformation of Functional Dataflow Parallel Programs into Imperative Programs

Aspects of Creating Parallel Programs in Dataflow Programming Paradigm

HPVM: Hardware-Agnostic Programming for Heterogeneous Parallel Systems

Systematically Understanding Graph Accelerator Dimensions and the Value of Hardware Flexibility

Scheduling Coflows With Dependency Graph

Dynamic Control of Computation Consistency in the Parallel Dataflow Computing System

Improving The Performances of WSN Using Data Scheduler and Hierarchical Tree

A Measurement-Based Message-Level Timing Prediction Approach for Data-Dependent SDFGs on Tile-Based Heterogeneous MPSoCs

An Efficient and Flexible Accelerator Design for Sparse Convolutional Neural Networks

The System for Transforming the Code of Dataflow Programs into Imperative

Daisy: Data analysis integrated software system for X-ray experiments

OctCNN: A High Throughput FPGA Accelerator for CNNs using Octave Convolution Algorithm

Concurrency Analysis in Dynamic Dataflow Graphs

A Ubiquitous Machine Learning Accelerator With Automatic Parallelization on FPGA

NZESPA: A Near-3D-Memory Zero Skipping Parallel Accelerator for CNNs

Parallel Designs for Metaheuristics that Solve Portfolio Selection Problems Using Fuzzy Outranking Relations

Методы и подходы к повышению надежности параллельной потоковой вычислительной системы

Влияние особенностей модели вычислений и архитектуры на надежность параллельной потоковой вычислительной системы

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Parallel Dataflow Research Articles

Related Topics

Articles published on Parallel Dataflow

A certain examination on heterogeneous systolic array (HSA) design for deep learning accelerations with low power computations

Features of Creating Parallel Programs for the Parallel Dataflow Computing System "Buran"

Transformation of Functional Dataflow Parallel Programs into Imperative Programs

Aspects of Creating Parallel Programs in Dataflow Programming Paradigm

HPVM: Hardware-Agnostic Programming for Heterogeneous Parallel Systems

Systematically Understanding Graph Accelerator Dimensions and the Value of Hardware Flexibility

Scheduling Coflows With Dependency Graph

Dynamic Control of Computation Consistency in the Parallel Dataflow Computing System

Improving The Performances of WSN Using Data Scheduler and Hierarchical Tree

A Measurement-Based Message-Level Timing Prediction Approach for Data-Dependent SDFGs on Tile-Based Heterogeneous MPSoCs

An Efficient and Flexible Accelerator Design for Sparse Convolutional Neural Networks

The System for Transforming the Code of Dataflow Programs into Imperative

Daisy: Data analysis integrated software system for X-ray experiments

OctCNN: A High Throughput FPGA Accelerator for CNNs using Octave Convolution Algorithm

Concurrency Analysis in Dynamic Dataflow Graphs

A Ubiquitous Machine Learning Accelerator With Automatic Parallelization on FPGA

NZESPA: A Near-3D-Memory Zero Skipping Parallel Accelerator for CNNs

Parallel Designs for Metaheuristics that Solve Portfolio Selection Problems Using Fuzzy Outranking Relations

Методы и подходы к повышению надежности параллельной потоковой вычислительной системы

Влияние особенностей модели вычислений и архитектуры на надежность параллельной потоковой вычислительной системы