Abstract

We show that uniform random sampling is not as effective as PPS (probability proportional to size) sampling in many estimation tasks. In the setting of (graph) size estimation, this paper demonstrates that random edge sampling outperforms random node sampling, with a performance ratio proportional to the normalized graph degree variance. This result is particularly important in the era of big data, when data are typically large and scale-free, resulting in large degree variance. We derive the result by first giving the variances of random node and random edge estimators. A simpler and more intuitive result is obtained by assuming that the data is large and degree distribution follows a power law.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call