Performance Implication of Tensor Irregularity and Optimization for Distributed Tensor Decomposition

Zheng Miao,Jiajia Li,Jon C Calhoun,Rong Ge

doi:10.1145/3580315

Abstract

Tensors are used by a wide variety of applications to represent multi-dimensional data; tensor decompositions are a class of methods for latent data analytics, data compression, and so on. Many of these applications generate large tensors with irregular dimension sizes and nonzero distribution. CANDECOMP/PARAFAC decomposition ( Cpd ) is a popular low-rank tensor decomposition for discovering latent features. The increasing overhead on memory and execution time of Cpd for large tensors requires distributed memory implementations as the only feasible solution. The sparsity and irregularity of tensors hinder the improvement of performance and scalability of distributed memory implementations. While previous works have been proved successful in Cpd for tensors with relatively regular dimension sizes and nonzero distribution, they either deliver unsatisfactory performance and scalability for irregular tensors or require significant time overhead in preprocessing. In this work, we focus on medium-grained tensor distribution to address their limitation for irregular tensors. We first thoroughly investigate through theoretical and experimental analysis. We disclose that the main cause of poor Cpd performance and scalability is the imbalance of multiple types of computations and communications and their tradeoffs; and sparsity and irregularity make it challenging to achieve their balances and tradeoffs. Irregularity of a sparse tensor is categorized based on two aspects: very different dimension sizes and a non-uniform nonzero distribution. Typically, focusing on optimizing one type of load imbalance causes other ones more severe for irregular tensors. To address such challenges, we propose irregularity-aware distributed Cpd that leverages the sparsity and irregularity information to identify the best tradeoff between different imbalances with low time overhead. We materialize the idea with two optimization methods: the prediction-based grid configuration and matrix-oriented distribution policy, where the former forms the global balance among computations and communications, and the latter further adjusts the balances among computations. The experimental results show that our proposed irregularity-aware distributed Cpd is more scalable and outperforms the medium- and fine-grained distributed implementations by up to 4.4 × and 11.4 × on 1,536 processors, respectively. Our optimizations support different sparse tensor formats, such as compressed sparse fiber (CSF), coordinate (COO), and Hierarchical Coordinate (HiCOO), and gain good scalability for all of them.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Performance Implication of Tensor Irregularity and Optimization for Distributed Tensor Decomposition

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Parallel Computing

Lead the way for us

Similar Papers

A Krylov-Schur-like method for computing the best rank-(r1,r2,r3) approximation of large and sparse tensors
Lars Eldén ... Maryam Dehghan
Numerical Algorithms | VOL. 91
Lars Eldén, et. al.Lars Eldén ... Maryam Dehghan
27 Apr 2022
Numerical Algorithms | VOL. 91

HOQRI: Higher-Order QR Iteration for Scalable Tucker Decomposition
Yuchen Sun ... Kejun Huang
-
Yuchen Sun, et. al.Yuchen Sun ... Kejun Huang
23 May 2022
23 May 2022

A Pipeline Computing Method of SpTV for Three-Order Tensors on CPU and GPU
Wangdong Yang ... Kenli Li
ACM Transactions on Knowledge Discovery from Data | VOL. 13
Wangdong Yang, et. al.Wangdong Yang ... Kenli Li
11 Nov 2019
ACM Transactions on Knowledge Discovery from Data | VOL. 13

CSTF
Zachary Blanco ... Maryam Mehri Dehnavi
-
Zachary Blanco, et. al.Zachary Blanco ... Maryam Mehri Dehnavi
13 Aug 2018
13 Aug 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Performance Implication of Tensor Irregularity and Optimization for Distributed Tensor Decomposition

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Parallel Computing