Abstract

Given multiple data sets of relational data that share a number of dimensions, how can we efficiently decompose our data into the latent factors? Factorization of a single matrix or tensor has attracted much attention, as, e.g., in the Netflix challenge, with users rating movies. However, we often have additional, side, information, like, e.g., demographic data about the users, in the Netflix example above. Incorporating the additional information leads to the coupled factorization problem. So far, it has been solved for relatively small datasets. We provide a distributed, scalable method for decomposing matrices, tensors, and coupled data sets through stochastic gradient descent on a variety of objective functions. We offer the following contributions: (1) Versatility: Our algorithm can perform matrix, tensor, and coupled factorization, with flexible objective functions including the Frobenius norm, Frobenius norm with an `1 induced sparsity, and non-negative factorization. (2) Scalability: FlexiFaCT scales to unprecedented sizes in both the data and model, with up to billions of parameters. FlexiFaCT runs on standard Hadoop. (3) Convergence proofs showing that FlexiFaCT converges on the variety of objective functions, even with projections.

Highlights

  • How can we efficiently mine data that capture relations between different entities? Suppose, for instance, that we are given a time-evolving social network, such as Facebook, and we have information about who messages whom, or who becomes friends with whom, and when

  • We offer the following contributions: (1) Versatility: Our algorithm can perform matrix, tensor, and coupled factorization, with flexible objective functions including the Frobenius norm, Frobenius norm with an 1 induced sparsity, and non-negative factorization

  • FlexiFaCT is very fast and scalable; we show how to implement it on Hadoop, and we show how to achieve high speeds, by distributing both the data as well as the parameters

Read more

Summary

Introduction

For instance, that we are given a time-evolving social network, such as Facebook, and we have information about who messages whom, or who becomes friends with whom, and when This data may be formulated as a three mode tensor. Suppose that we have some side information pertaining to the users, e.g. demographic information This problem can be formulated as an instance of a so-called coupled factorization, where the two pieces of. We propose FlexiFaCT, a flexible and highly scalable distributed factorization algorithm which attacks a very broad spectrum of problems: FlexiFaCT can handle matrices, tensors, coupled tensor-matrix settings, cross product a variety of loss functions, including Frobenius norm, KL divergence, 1 regularization, and non-negativity constraints. FlexiFaCT includes several recent methods [9], [11], as special cases

Usability and Reproducibility
Related Work and Background
FlexiFaCT Approach
Proof of convergence with projections
MapReduce Implementation of FlexiFaCT
Experiments
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call