Effective Tensor-Based Data Clustering Through Sub-Tensor Impact Graphs

Xinsheng Li,Shengyu Huang,Maria Luisa Sapino,K. Selçuk Candan

doi:10.1007/978-3-319-97864-2_7

Abstract

Tensors are commonly used for representing multi-modal data, such as Web graphs, sensor streams, and social networks. As a consequence of this, tensor-based algorithms, most notably tensor decomposition, are becoming a core tool for data analysis and knowledge discovery, including clustering. Intuitively, tensor decomposition process generalizes matrix decomposition to high-dimensional arrays (known as tensors) and rewrites the given tensor in the form of a set of factor matrices (one for each mode of the input tensor) and a core tensor (which, intuitively, describes the spectral structure of the given tensor). These factor matrices and core tensors then can be used for obtaining multi-modal clusters of the input data. One key problem with tensor decomposition, however, is its computational complexity. One way to deal with this challenge is to partition the tensor and obtain the tensor decomposition leveraging these smaller partitions. This solution, however, leaves an important open question: how to most effectively combine results from these partitions. In this chapter, we introduce the notion of sub-tensor impact graphs (SIGs), which quantify how the decompositions of these sub-partitions impact each other and the overall tensor decomposition accuracy and present several complementary algorithms that leverage this novel concept to address various key challenges in tensor decomposition: (a) Personalized Tensor Decomposition (PTD) algorithm leverages sub-tensor impact graphs to focus the accuracy of the tensor decomposition process on parts of the data tensor which are most relevant to a particular clustering task; whereas the (b) noise-profile adaptive tensor decomposition (nTD) method leverages limited a priori information about noise distribution in the data to improve tensor decomposition accuracy. Finally, (c) a two-phase block-incremental tensor decomposition technique, BICP, efficiently and effectively maintains tensor decomposition results in the presence of incrementally evolving tensor data. We also present experimental results, with diverse data sets, that show that, if properly constructed, sub-tensor impact graphs can indeed help overcome various density and noise challenges in clustering of multi-modal data sets.

Full Text