Abstract

Sparse, irregular graphs show up in various applications like linear algebra, machine learning, engineering simulations, robotic control, etc. These graphs have a high degree of parallelism, but their execution on parallel threads of modern platforms remains challenging due to the irregular data dependencies. The execution performance can be improved by efficiently partitioning the graphs such that the communication and thread synchronization overheads are minimized without hurting the utilization of the threads. To achieve this, this paper proposes GRAPHOPT, a tool that models the graph parallelization as a constrained optimization problem and uses the open Google OR-Tools solver to find good partitions. Several scalability techniques are developed to handle large real-world graphs with millions of nodes and edges. Extensive experiments are performed on the graphs of sparse matrix triangular solves (linear algebra) and sum-product networks (machine learning), respectively, showing a mean speedup of 2.0X and 1.8X over previous state-of-the-art libraries, demonstrating the effectiveness of the constrained-optimization-based graph parallelization.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call