Abstract

Tensors, or N-dimensional arrays, are increasingly used to represent multi-dimensional data. Sparse tensor decomposition algorithms are of particular interest in analyzing and compressing big datasets due to the fact that most of real-world data is sparse and multi-dimensional. However, state-of-the-art tensor decomposition algorithms are not scalable for overwhelmingly large and higher-order sparse tensors on distributed platforms. In this paper, we use the MapReduce model and the Spark engine to implement tensor factorizations on distributed platforms. The proposed CSTF algorithm, Cloud-based Sparse Tensor Factorization, is a scalable distributed algorithm for tensor decompositions for large data. It uses the coordinate storage format (COO) to operate on the tensor nonzeros directly, thus, eliminating the need for tensor unfolding and the storage of intermediate data. Also, a novel queuing strategy (QCOO) is proposed to exploit the dependency and data reuse between a sequence of tensor operations in tensor decomposition algorithms. Details on the key-value storage paradigm and Spark features used to implement the algorithm and the data reuse strategies are also provided. The queuing strategy reduces data communication costs by 35% for 3rd-order tensors and by 31% for 4th-order tensors over the COO-based implementation respectively. Compared to the state-of-the-art work, BIGtensor, CSTF achieves 2.2× to 6.9× speedup for 3rd-order tensor decompositions.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.