Low-overhead load-balanced scheduling for sparse tensor computations

Muthu Baskaran,Richard Lethin,Benoit Meister

doi:10.1109/hpec.2014.7041006

Abstract

Irregular computations over large-scale sparse data are prevalent in critical data applications and they have significant room for improvement on modern computer systems from the aspects of parallelism and data locality. We introduce new techniques to efficiently map large irregular computations onto modern multi-core systems with non-uniform memory access (NUMA) behavior. Our techniques are broadly applicable for irregular computations with multi-dimensional sparse arrays (or sparse tensors). We implement a static-cum-dynamic task scheduling scheme with low overhead for effective parallelization of sparse computations. We introduce locality-aware optimizations to the task scheduling mechanism that are driven by the sparse input data pattern. We evaluate our techniques using two popular sparse tensor decomposition methods that have wide applications in data mining, graph analysis, signal processing, and elsewhere. Our techniques not only improve parallel performance but also result in improved performance scalability with increasing number of cores. We achieve around 4–5x improvement in performance over existing parallel approaches and observe “scalable” parallel performance on modern multi-core systems with up to 32 processor cores. We take real sparse data sets as input to the sparse tensor computations and demonstrate the achieved improvements.

Full Text