MICCO: An Enhanced Multi-GPU Scheduling Framework for Many-Body Correlation Functions

Qihan Wang,Robert G Edwards,Bin Ren,Jie Chen

doi:10.1109/ipdps53621.2022.00022

Abstract

Calculation of many-body correlation functions is one of the critical kernels utilized in many scientific computing areas, especially in Lattice Quantum Chromodynamics (Lattice QCD). It is formalized as a sum of a large number of contraction terms each of which can be represented by a graph consisting of vertices describing quarks inside a hadron node and edges designating quark propagations at specific time intervals. Due to its computation- and memory-intensive nature, real-world physics systems (e.g., multi-meson or multi-baryon systems) explored by Lattice QCD prefer to leverage multi-GPUs. Different from general graph processing, many-body correlation function calculations show two specific features: a large number of computation-/data-intensive kernels and frequently repeated appearances of original and intermediate data. The former results in expensive memory operations such as tensor movements and evictions. The latter offers data reuse opportunities to mitigate the data-intensive nature of many-body correlation function calculations. However, existing graph-based multi-GPU schedulers cannot capture these data-centric features, thus resulting in a sub-optimal performance for many-body correlation function calculations. To address this issue, this paper presents a multi-GPU scheduling framework, MICCO, to accelerate contractions for correlation functions particularly by taking the data dimension (e.g., data reuse and data eviction) into account. This work first performs a comprehensive study on the interplay of data reuse and load balance, and designs two new concepts: local reuse pattern and reuse bound to study the opportunity of achieving the optimal trade-off between them. Based on this study, MICCO proposes a heuristic scheduling algorithm and a machine-learning-based regression model to generate the optimal setting of reuse bounds. Specifically, MICCO is integrated into a real-world Lattice QCD system, Redstar, for the first time running on multiple GPUs. The evaluation demonstrates MICCO outperforms other state-of-art works, achieving up to 2.25× speedup in synthesized datasets, and 1.49× speedup in real-world correlation functions.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

MICCO: An Enhanced Multi-GPU Scheduling Framework for Many-Body Correlation Functions

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

MemHC: An Optimized GPU Memory Management Framework for Accelerating Many-body Correlation
Qihan Wang ... Robert G Edwards
ACM Transactions on Architecture and Code Optimization | VOL. 19
Qihan Wang, et. al.Qihan Wang ... Robert G Edwards
24 Mar 2022
ACM Transactions on Architecture and Code Optimization | VOL. 19

Calculation and optimization of correlation function in distillation method of lattice quantum chromodynamcis
Ren-Qiang Zhang ... Chong Zeng
Acta Physica Sinica | VOL. 70
Ren-Qiang Zhang, et. al.Ren-Qiang Zhang ... Chong Zeng
01 Jan 2020
Acta Physica Sinica | VOL. 70

A molecular dynamics simulation of interaction-induced FIR absorption spectra of liquid CS2
Jannis Samios ... Thomas Dorfmüller
Molecular Physics | VOL. 59
Jannis Samios, et. al.Jannis Samios ... Thomas Dorfmüller
01 Sep 1986
Molecular Physics | VOL. 59

SOME DYNAMICAL PROPERTIES OF THE ISING FERROMAGNET
Noboru Matsudaira
Canadian Journal of Physics | VOL. 45
Noboru MatsudairaNoboru Matsudaira
01 Jun 1967
Canadian Journal of Physics | VOL. 45

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MICCO: An Enhanced Multi-GPU Scheduling Framework for Many-Body Correlation Functions

Abstract

Talk to us

Similar Papers