Abstract

Social networks, communication networks, biological networks and many other information networks can be modeled as a large graph. Graph vertices represent entities and graph edges represent the relationships or interactions among entities. In many large graphs, there is usually one or more attributes associated with every graph vertex to describe its properties. The goal of graph clustering is to partition vertices in a large graph into subgraphs (clusters) based on a set of criteria, such as vertex similarity measures, adjacency-based measures, connectivity-based measures, density measures, or cut-based measures. Although graph clustering has been studied extensively, the problem of clustering analysis of large graphs with rich attributes remains a big challenge in practice. In this chapter we first give an overview of the set of issues and challenges for clustering analysis of large graphs with vertices of rich attributes. Based on the type of measures used for identifying clusters, existing graph clustering methods can be categorized into three classes: structure based clustering, attribute based clustering and structure-attribute based clustering. Structure based clustering mainly focuses on the topological structure of a graph for clustering, but largely ignore the vertex properties which are often heterogenous. Attribute based clustering, in contrast, focuses primarily on attribute-based vertex similarity, but suffers from isolated partitions of the graph as a result of graph clustering. Structure-attribute based clustering is a hybrid approach, which combines structural and attribute similarities through a unified distance measure. We argue that effective clustering analysis of a large graph with rich attributes requires the clustering methods to provide a systematic graph analysis framework that partition the graph based on both structural similarity and attribute similarity. One approach is to model rich attributes of vertices as auxiliary edges among vertices, resulting in a complex attribute augmented graph with multiple edges between some vertices. To show how to best combine structure and attribute similarity in a unified framework, the second part of this chapter will outline a cluster-convergence based iterative edge-weight assignment scheme that assigns different weights to different attributes based on how fast the clusters converge. We use a K-Medoids clustering algorithm to partition a graph into k clusters with both cohesive intra-cluster structures and homogeneous attribute values based on iterative weight updates. At each iteration, a series of matrix multiplication operations is used for calculating the random walk distances between graph vertices. Optimizations are used to reduce the cost of recalculating the random walk distances upon each iteration of the edge weight update. Finally, we discuss the set of open problems in graph clustering with rich attributes, including storage cost and efficiency, scalable analytics under memory constraints, distributed graph clustering and parallel processing.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.