Learning and Interpreting Complex Distributions in Empirical Data

Chengxi Zang,Peng Cui,Wenwu Zhu

doi:10.1145/3219819.3220073

Abstract

To fit empirical data distributions and then interpret them in a generative way is a common research paradigm to understand the structure and dynamics underlying the data in various disciplines. However, previous works mainly attempt to fit or interpret empirical data distributions in a case-by-case way. Faced with complex data distributions in the real world, can we fit and interpret them by a unified but parsimonious parametric model? In this paper, we view the complex empirical data as being generated by a dynamic system which takes uniform randomness as input. By modeling the generative dynamics of data, we showcase a four-parameter dynamic model together with inference and simulation algorithms, which is able to fit and generate a family of distributions, ranging from Gaussian, Exponential, Power Law, Stretched Exponential (Weibull), to their complex variants with multi-scale complexities. Rather than a black box, our model can be interpreted by a unified differential equation, which captures the underlying generative dynamics. More powerful models can be constructed by our framework in a principled way. We validate our model by various synthetic datasets. We then apply our model to $16$ real-world datasets from different disciplines. We show the systematic biases of fitting these datasets by the most widely used methods and show the superiority of our model. In short, our model potentially provides a framework to fit complex distributions in empirical data, and more importantly, to understand their generative mechanisms.

Full Text