Abstract

Canonical Polyadic Decomposition (CPD) is a powerful technique for uncovering multilinear relationships in tensors. Current research in scalable CPD has focused on designing efficient decomposition algorithms for large sparse tensors that arise in machine learning and data mining applications. This work addresses the complementary need for efficient decomposition algorithms for large dense tensors that arise in signal processing applications. Such tensors are often highly skewed, with one mode (e.g., time) orders of magnitude larger than the others. We present an algorithm appropriate for MapReduce settings that uses both regularization and sketching to efficiently operate on such tensors. We have open-sourced an Apache Spark implementation of the algorithm and evaluate it on synthetic and real datasets to characterize the trade-offs in runtime and accuracy when using different types and combinations of regularization and sketching. We observe that a combination of random entry sketching plus Tikhonov regularization works best independent of the type or level of noise in the tensor. Similarly, we find that random entry sketching plus proximal regularization works best for ill-conditioned tensors. Further experiments demonstrate that the runtime scales sublinearly with the tensor size and highly sublinearly with the tensor rank. The use of regularization and sketching results in runtimes that are factors of 42-112× faster than those of the previous state-of-the-art MapReduce CPD implementation for large dense, skewed tensors, while having a negligible impact on the accuracy of the decompositions.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.