Abstract

The complexity underlying real-world systems implies that standard statistical hypothesis testing methods may not be adequate for these peculiar applications. Specifically, we show that the likelihood-ratio (LR) test’s null-distribution needs to be modified to accommodate the complexity found in multi-edge network data. When working with independent observations, the p-values of LR tests are approximated using a χ 2 distribution. However, such an approximation should not be used when dealing with multi-edge network data. This type of data is characterized by multiple correlations and competitions that make the standard approximation unsuitable. We provide a solution to the problem by providing a better approximation of the LR test null-distribution through a beta distribution. Finally, we empirically show that even for a small multi-edge network, the standard χ 2 approximation provides erroneous results, while the proposed beta approximation yields the correct p-value estimation.

Highlights

  • Complex systems are notoriously challenging to analyze due to the large number of interdependencies, competitions, and correlations underlying their dynamics

  • The study of complex systems is intertwined with network science and advanced multivariate statistics

  • Because interactions between system agents tend not to be independent, many standard statistical methods should be employed with care when dealing with network data

Read more

Summary

Overview

Complex systems are notoriously challenging to analyze due to the large number of interdependencies, competitions, and correlations underlying their dynamics To deal with these issues, data-driven studies of complex systems are based—either directly or indirectly—on the careful formulation of models representing different hypotheses about the system. Less attention has been devoted to developing hypothesis testing methods specific to network models and network data, commonly used to study complex systems. If the alternative hypothesis does not fit the observed data well, we can expect the probability of observing λ from the null-distribution to be relatively large. We are often faced with multi-edge network data These data consist of m repeated—and possibly time-stamped—edges (i, j) representing interactions between n different agents i, j, the vertices of the network. Such models can be used to evaluate different hypotheses about the data [4]

Statistical hypotheses and gHypEG
Encoding hypotheses
Simulation studies
Case study
Discussion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.