CuFasterTucker: A Stochastic Optimization Strategy for Parallel Sparse FastTucker Decomposition on GPU Platform

Zixuan Li,Wangdong Yang,Kenli Li,Qi Xiao,Yunchuan Qin

doi:10.1145/3648094

Zixuan Li, Wangdong Yang + Show 3 more

Open Access

PDF Available

https://doi.org/10.1145/3648094

Copy DOI

Export

Save

Cite

Journal: ACM Transactions on Parallel Computing	Publication Date: Feb 16, 2024
License type: mit

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

The amount of scientific data is currently growing at an unprecedented pace, with tensors being a common form of data that display high-order, high-dimensional, and sparse features. While tensor-based analysis methods are effective, the vast increase in data size has made processing the original tensor infeasible. Tensor decomposition offers a solution by decomposing the tensor into multiple low-rank matrices or tensors that can be efficiently utilized by tensor-based analysis methods. One such algorithm is the Tucker decomposition, which decomposes an N -order tensor into N low-rank factor matrices and a low-rank core tensor. However, many Tucker decomposition techniques generate large intermediate variables and require significant computational resources, rendering them inadequate for processing high-order and high-dimensional tensors. This paper introduces FasterTucker decomposition, a novel approach to tensor decomposition that builds on the FastTucker decomposition, a variant of the Tucker decomposition. We propose an efficient parallel FasterTucker decomposition algorithm, called cuFasterTucker, designed to run on a GPU platform. Our algorithm has low storage and computational requirements and provides an effective solution for high-order and high-dimensional sparse tensor decomposition. Compared to state-of-the-art algorithms, our approach achieves a speedup of approximately 7 to 23 times.

Full Text