Abstract

How can we extract hidden relations from a tensor and a matrix data simultaneously in a fast, accurate, and scalable way? Coupled matrix-tensor factorization (CMTF) is an important tool for this purpose. Designing an accurate and efficient CMTF method has become more crucial as the size and dimension of real-world data are growing explosively. However, existing methods for CMTF suffer from lack of accuracy, slow running time, and limited scalability. In this paper, we propose S3CMTF, a fast, accurate, and scalable CMTF method. In contrast to previous methods which do not handle large sparse tensors and are not parallelizable, S3CMTF provides parallel sparse CMTF by carefully deriving gradient update rules. S3CMTF asynchronously updates partial gradients without expensive locking. We show that our method is guaranteed to converge to a quality solution theoretically and empirically. S3CMTF further boosts the performance by carefully storing intermediate computation and reusing them. We theoretically and empirically show that S3CMTF is the fastest, outperforming existing methods. Experimental results show that S3CMTF is up to 930× faster than existing methods while providing the best accuracy. S3CMTF shows linear scalability on the number of data entries and the number of cores. In addition, we apply S3CMTF to Yelp rating tensor data coupled with 3 additional matrices to discover interesting patterns.

Highlights

  • Given a tensor data, and related matrix data, how can we analyze them efficiently? Tensors and matrices are natural representations for various real world high-order data [1, 2, 3]

  • We propose S3CMTF (Sparse, lock-free stochastic gradient descent (SGD) based, and Scalable Coupled matrix-tensor factorization (CMTF)), a CMTF method which resolves the problems of previous methods

  • We show that asynchronously parallel stochastic gradient descent (SGD) is useful for S3CMTF in multi-core shared memory systems without expensive locking

Read more

Summary

Introduction

Related matrix data, how can we analyze them efficiently? Tensors (i.e., multi-dimensional arrays) and matrices are natural representations for various real world high-order data [1, 2, 3]. CMTF-Tucker-ALS and CMTF-OPT undergo high reconstruction error since the former is not applicable to sparse data, and the latter focuses only on CP model and cannot be generalized to the Tucker model. Both methods are sequential and hard to take benefit of multi-core parallelization. S3CMTF provides parallel, sparse CMTF based on Tucker factorization unlike previous methods which do not support sparse tensors or cannot be parallelized. Discovery: Applying S3CMTF on Yelp review dataset with a 3-mode tensor (user, business, time) coupled with 3 additional matrices ((user, user), (business, category), and (business, city)), we observe interesting patterns and clusters of businesses and suggest a process for personal recommendation. IN ), and matrices are denoted by boldface capitals (e.g. A). xα and aβ denote the entry of X and A with indices α and β, respectively

À nfAg
Objective function & gradient
Experiments
Methods
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call