Abstract

Power-law distributions occur in many situations of scientific interest and have significant consequences for our understanding of natural and man-made phenomena. Unfortunately, the detection and characterization of power laws is complicated by the large fluctuations that occur in the tail of the distribution—the part of the distribution representing large but rare events—and by the difficulty of identifying the range over which power-law behavior holds. Commonly used methods for analyzing power-law data, such as least-squares fitting, can produce substantially inaccurate estimates of parameters for power-law distributions, and even in cases where such methods return accurate answers they are still unsatisfactory because they give no indication of whether the data obey a power law at all. Here we present a principled statistical framework for discerning and quantifying power-law behavior in empirical data. Our approach combines maximum-likelihood fitting methods with goodness-of-fit tests based on the Kolmogorov–Smirnov (KS) statistic and likelihood ratios. We evaluate the effectiveness of the approach with tests on synthetic data and give critical comparisons to previous approaches. We also apply the proposed methods to twenty-four real-world data sets from a range of different disciplines, each of which has been conjectured to follow a power-law distribution. In some cases we find these conjectures to be consistent with the data, while in others the power law is ruled out.

Highlights

  • Many empirical quantities cluster around a typical value

  • Assuming that our data are drawn from a distribution that follows a power law exactly for x ≥ xmin, we can derive maximum likelihood estimators (MLEs) of the scaling parameter for both the discrete and continuous cases

  • The Anderson–Darling statistic, on the other hand, we find to be highly conservative in this application, giving estimates xmin that are too large by an order of magnitude or more

Read more

Summary

Introduction

Many empirical quantities cluster around a typical value. The speeds of cars on a highway, the weights of apples in a store, air pressure, sea level, the temperature in New York at noon on a midsummer’s day: all of these things vary somewhat, but their distributions place a negligible amount of probability far from the typical value, making the typical value representative of most observations. Assuming that our data are drawn from a distribution that follows a power law exactly for x ≥ xmin, we can derive maximum likelihood estimators (MLEs) of the scaling parameter for both the discrete and continuous cases. Values of the scaling parameter estimated using four of the methods of Table 2 (we omit the methods based on logarithmic bins for the PDF and constant width bins for the CDF) for n = 10 000 observations drawn from (a) discrete and (b) continuous power-law distributions with xmin = 1. The MLE gives accurate answers when xmin is chosen exactly equal to the true value, but deviates rapidly below this point (because the distribution deviates from power law) and more slowly above (because of dwindling sample size) It would probably be acceptable in this case for xmin to err a little on the high side (though not too much), but estimates that are too low could have severe consequences. There are a variety of measures for quantifying the distance between two probability distributions, but for nonnormal data the commonest is the Kolmogorov–Smirnov or KS statistic [46], which is the maximum distance between the CDFs of the

The same procedure of reducing the likelihood by
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call