Abstract

Tensors are multi-dimensional mathematical objects that allow to model complex relationships and to perform decompositions for analytical purpose. They are used in a wide range of data mining applications. In social network analysis, tensor decompositions give interesting insights by taking into consideration multiple characteristics of data. However, the power-law distribution of such data forces the decomposition to reveal only the strong signals that hide information of interest having a lighter intensity. To reveal hidden information, we propose a method to stratify the signal, by gathering clusters of similar intensity in each stratum. It is an iterative process, in which the CANDECOMP/PARAFAC (CP) decomposition is applied and its result is used to deflate the tensor, i.e., by removing from the tensor the clusters found with the decomposition. As the CP decomposition is computationally demanding, it is also necessary to optimize its algorithm, to apply it on large-scale data with a reasonable execution time, even with the several executions needed by the iterative process of the stratification. Therefore, we propose an algorithm that uses both dense and sparse data structures and that leverages coarse and fine grained optimizations in addition to incremental computations in order to achieve large scale CP tensor decomposition. Our implementation outperforms the baseline of large-scale CP decomposition libraries by several orders of magnitude. We validate our stratification method and our optimized algorithm on a Twitter dataset about COVID vaccines.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call