Abstract

Counting graphlets is a well-studied problem in graph mining and social network analysis. Recently, several papers explored very simple and natural algorithms based on Monte Carlo sampling of Markov Chains (MC), and reported encouraging results. We show, perhaps surprisingly, that such algorithms are outperformed by color coding (CC) [2], a sophisticated algorithmic technique that we extend to the case of graphlet sampling and for which we prove strong statistical guarantees. Our computational experiments on graphs with millions of nodes show CC to be more accurate than MC; furthermore, we formally show that the mixing time of the MC approach is too high in general, even when the input graph has high conductance. All this comes at a price however. While MC is very efficient in terms of space, CC’s memory requirements become demanding when the size of the input graph and that of the graphlets grow. And yet, our experiments show that CC can push the limits of the state-of-the-art, both in terms of the size of the input graph and of that of the graphlets.

Highlights

  • Counting graphlets is a well-studied problem in graph mining and social-networks analysis [1, 3, 7, 8, 11, 14, 18, 20, 27,28,29, 32]

  • We show that even a single run of color coding (CC), whose output can be seen as a large sample of the population of graphlets, gives reasonably good statistical guarantees

  • We note that Pairwise Subgraph Random walk (PSRW) has been developed with the primary goal of minimizing the number of nodes of G visited by the walk; in the present article, we investigate it in terms of samples taken, running time, and accuracy

Read more

Summary

Introduction

Counting graphlets is a well-studied problem in graph mining and social-networks analysis [1, 3, 7, 8, 11, 14, 18, 20, 27,28,29, 32]. The problem asks to count the frequencies of all induced connected subgraphs (called graphlets), up to isomorphism, of a certain size. Understanding the distribution of graphlets allows us to make key inferences about the structural properties of the underlying graph and the interaction of the nodes in the graph (e.g., [22]) It sheds light on the type of local structures that are present in the graph, which can be used for a myriad of analysis [3, 8, 16, 27,28,29]. How the graphlets form in the first place and how they temporally evolve are semantically more actionable than the interpretation yielded by the mere evolution of nodes and edges

Objectives
Methods
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call