Abstract

Various digital data sets that encode user-item relationships contain a multilevel overlapping cluster structure. The user-item relation can be encoded in a weighted bipartite graph and uncovering these overlapping coclusters of users and items at multiple levels in the bipartite graph can play an important role in analyzing user-item data in many applications. For example, for effective online marketing, such as placing online ads or deploying smart online marketing strategies, identifying co-occurring clusters of users and items can lead to accurately targeted advertisements and better marketing outcomes. In this paper, we propose fast algorithms inspired by algebraic multigrid methods for finding multilevel overlapping cocluster structures of feature matrices that encode user-item relations. Starting from the weighted bipartite graph structure of the feature matrix, the algorithms use agglomeration procedures to recursively coarsen the bipartite graphs that represent the relations between the coclusters on increasingly coarser levels. New fast coarsening routines are described that circumvent the bottleneck of all-to-all similarity computations by exploiting measures of direct connection strength between row and column variables in the feature matrix. Providing accurate coclusters at multiple levels in a manner that can scale to large data sets is a challenging task. In this paper, we propose heuristic algorithms that approximately and recursively minimize normalized cuts to obtain coclusters in the aggregated bipartite graphs on multiple levels of resolution. Whereas the main novelty and focus of the paper lies in algorithmic aspects of reducing computational complexity to obtain scalable methods specifically for large rectangular user-item matrices, the algorithmic variants also define several new models for determining multilevel coclusters that we justify intuitively by relating them to principles that underlie collaborative filtering methods for user-item relationships. Experimental results show that the proposed algorithms successfully uncover the multilevel overlapping cluster structure for artificial and real data sets. Summary of Contribution: This paper develops new and efficient computational methods for finding the multilevel overlapping cocluster structure of feature matrices that encode user-item relationships. We base our approach on the use of pairwise similarity measures between features, seeking clusters of points that are similar to each other and dissimilar from the points outside the cluster. We approximately solve the problem of finding optimal overlapping coclusters on multiple levels by employing a framework that is based on efficient multilevel methods that have been used previously to solve sparse linear systems and to cluster graphs. Our main contribution is that we extend these methods in efficient manners to find coclusters in the bipartite graphs that encode common and important user-item relationships or social network relations. The novel methods that we propose are inherently scalable to large problem sizes and are naturally able to uncover overlapping coclusters at multiple levels, whereas existing methods generally only find coclusters at the fine level. We illustrate the algorithm and its performance on some standard test problems from the literature and on a proof-of-concept real-world data set that relates LinkedIn users to their skills and expertise.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call