Reduce Data Communication Research Articles

Distributed computing, such as cloud computing, provides promising platforms for orchestrating scientific workflows' tasks based on their sequences and dependencies. Workflow scheduling plays an important role in optimizing concerned objectives for distributed computing, such as minimizing the makespan and cost. Many researchers have focused on optimizing a specific single workflow with multiple objectives. Currently, there are few studies on multi-workflow scheduling, with most research focusing on objectives such as cost and makespan. However, multi-workflow scheduling requires the design of specific objectives that reflect the unique characteristics of multiple workflows. On the other hand, clustering-based approaches have garnered significant attention in the field of workflow scheduling over distributed computing resources due to their advantage in reducing data communication among tasks. Despite this, the effectiveness of clustering-based algorithms has not been extensively studied and validated in the context of multi-objective multi-workflow scheduling models. Motivated by these factors, we propose an approach for multiple workflows' multi-objective optimization (MOO), considering the new defined metric, fairness. We first mathematically formulate the fairness and define a fairness-involved MOO model. Then, we propose an advanced clustering-based resource optimization strategy in multiple workflow runs. Experimental results show that the proposed approach performs better than the compared algorithms without significant compromise of the overall makespan and cost as well as individual fairness, which can guide the simulation workflow scheduling on clouds.

This paper presents GraphAGILE, a domain-specific FPGA-based overlay accelerator for graph neural network (GNN) inference. GraphAGILE consists of (1) <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">a novel unified architecture design</i> with an <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">instruction set</i> , and (2) <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">a compiler</i> built upon the instruction set that can quickly generate optimized code. Due to the proposed instruction set architecture (ISA) and the compiler, GraphAGILE does not require any FPGA reconfiguration when performing inference on various GNN models and input graphs. For the architecture design, we propose a novel hardware module named Adaptive Computation Kernel (ACK), that can execute various computation kernels of GNNs, including general matrix multiplication (GEMM), sparse-dense matrix multiplication (SpDMM), and sampled dense-dense matrix multiplication (SDDMM). The compiler takes the specifications of a GNN model and the graph meta data (e.g., the number of vertices and edges) as input, and generates a sequence of instructions for inference execution. We develop the following compiler optimizations to reduce inference latency: (1) computation order optimization that automatically reorders the computation graph to reduce the total computation complexity, (2) layer fusion that merges adjacent layers to reduce data communication volume, (3) data partitioning with a partition-centric execution scheme that partitions the input graph to fit the available on-chip memory of FPGA, (4) kernel mapping that automatically selects execution mode for ACK, and performs task scheduling to overlap computation with data communication and achieves dynamic load balance. We implement GraphAGILE on a state-of-the-art FPGA platform, Xilinx Alveo U250. GraphAGILE can execute widely used GNN models, including GCN, GAT, GIN, GraphSAGE, SGC and other GNN models supported by GraphGym. Experimental results show that GraphAGILE achieves up to <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$47.1\times$</tex-math></inline-formula> ( <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$3.9\times$</tex-math></inline-formula> ) reduction in end-to-end latency, including the latency of compilation and hardware execution, compared with the state-of-the-art implementations on CPU (GPU), and achieves up to <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$2.9\times$</tex-math></inline-formula> reduction in hardware execution latency compared with the state-of-the-art FPGA accelerators.

Reduce Data Communication Research Articles

Related Topics

Articles published on Reduce Data Communication

Signal Processing Method to Reduce Data Communication Delay between Computer and PLC

Clustering-based multi-objective optimization considering fairness for multi-workflow scheduling on clouds

Particle-resolved thermal lattice Boltzmann simulation using OpenACC on multi-GPUs

GraphAGILE: An FPGA-Based Overlay Accelerator for Low-Latency GNN Inference

Towards an Effective Service Allocation in Fog Computing.

A Fast Parallel Random Forest Algorithm Based on Spark

Distributed data filtering and modeling for fog and networked manufacturing

Fed-IoUT: Opportunities and Challenges of Federated Learning in the Internet of Underwater Things

Sustainable and Efficient Fog-Assisted IoT Cloud Based Data Collection and Delivery for Smart Cities

Exploring Scalable, Distributed Real-Time Anomaly Detection for Bridge Health Monitoring

An Efficient Distributed SPARQL Query Processing Scheme Considering Communication Costs in Spark Environments

On the Test Particle Monte-Carlo method to solve the steady state Boltzmann equation, the congruity of its results with experiments and its potential for shared memory parallelism

Minimizing Training Time of Distributed Machine Learning by Reducing Data Communication

A less computationally complex clustering algorithm based on dynamic K-means for increasing lifetime of wireless sensor networks

A Wearable Bio-signal Processing System with Ultra-low-power SoC and Collaborative Neural Network Classifier for Low Dimensional Data Communication.

Green content communications in 6LoWPAN

An improved algorithm for generalized least squares estimation

A new compression technique in MANET: compressed-LZW algorithm

Energy- and locality-efficient multi-job scheduling based on MapReduce for heterogeneous datacenter

Cloud-Edge Network Data Processing based on User Requirements using Modify MapReduce Algorithm and Machine Learning Techniques

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Reduce Data Communication Research Articles

Related Topics

Articles published on Reduce Data Communication

Signal Processing Method to Reduce Data Communication Delay between Computer and PLC

Clustering-based multi-objective optimization considering fairness for multi-workflow scheduling on clouds

Particle-resolved thermal lattice Boltzmann simulation using OpenACC on multi-GPUs

GraphAGILE: An FPGA-Based Overlay Accelerator for Low-Latency GNN Inference

Towards an Effective Service Allocation in Fog Computing.

A Fast Parallel Random Forest Algorithm Based on Spark

Distributed data filtering and modeling for fog and networked manufacturing

Fed-IoUT: Opportunities and Challenges of Federated Learning in the Internet of Underwater Things

Sustainable and Efficient Fog-Assisted IoT Cloud Based Data Collection and Delivery for Smart Cities

Exploring Scalable, Distributed Real-Time Anomaly Detection for Bridge Health Monitoring

An Efficient Distributed SPARQL Query Processing Scheme Considering Communication Costs in Spark Environments

On the Test Particle Monte-Carlo method to solve the steady state Boltzmann equation, the congruity of its results with experiments and its potential for shared memory parallelism

Minimizing Training Time of Distributed Machine Learning by Reducing Data Communication

A less computationally complex clustering algorithm based on dynamic K-means for increasing lifetime of wireless sensor networks

A Wearable Bio-signal Processing System with Ultra-low-power SoC and Collaborative Neural Network Classifier for Low Dimensional Data Communication.

Green content communications in 6LoWPAN

An improved algorithm for generalized least squares estimation

A new compression technique in MANET: compressed-LZW algorithm

Energy- and locality-efficient multi-job scheduling based on MapReduce for heterogeneous datacenter

Cloud-Edge Network Data Processing based on User Requirements using Modify MapReduce Algorithm and Machine Learning Techniques