Abstract

We study the graphlet sampling problem: given an integer k ≥ 3 and a graph G=(V,E), sample a connected induced k-node subgraph of G (also called k-graphlet) uniformly at random. This is a fundamental graph mining primitive, with applications in social network analysis and bioinformatics. The two state-of-the-art techniques are random walks and color coding. The random walk is elegant, but the current upper bounds and lower bounds on its mixing time suffer a gap of Δk−1 where Δ is the maximum degree of G. Color coding is better understood, but requires a 2O(k) m-time preprocessing over the entire graph. Moreover, no efficient algorithm is known for sampling graphlets uniformly — random walks and color coding yield only є-uniform samples. In this work, we provide the following results: (i) A near-optimal mixing time bound for the classic k-graphlet random walk, as a function of the mixing time of G. In particular, ignoring kO(k) factors, we show that the k-graphlet random walk mixes in Θ(t(G) · ρ(G)k−1) steps, where t(G) is the mixing time of G and ρ(G) is the ratio between its maximum and minimum degree, and on some graphs this is tight up to lgn factors. (ii) The first efficient algorithm for uniform graphlet sampling. The algorithm has a preprocessing phase that uses time O(n k2 lgk + m) and space O(n), and a sampling phase that takes kO(k) lgΔ time per sample. It is based on ordering G in a simple way, so to virtually partition the graphlets into buckets, and then sampling from those buckets using rejection sampling. The algorithm can be used also for counting, with additive guarantees. (iii) A near-optimal algorithm for є-uniform graphlet sampling, with a preprocessing phase that runs in time O(k6 є−1 n lgn) and space O(n), and a sampling phase that takes kO(k)(1/є)10 lg1/є expected time per sample. The algorithm is based on a nontrivial sketching of the ordering of G, followed by emulating uniform sampling through coupling arguments.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call