Abstract

ABSTRACTIn this article we propose methodology for inference of binary-valued adjacency matrices from various measures of the strength of association between pairs of network nodes, or more generally pairs of variables. This strength of association can be quantified by sample covariance and correlation matrices, and more generally by test-statistics and hypothesis test p-values from arbitrary distributions. Community detection methods such as block modeling typically require binary-valued adjacency matrices as a starting point. Hence, a main motivation for the methodology we propose is to obtain binary-valued adjacency matrices from such pairwise measures of strength of association between variables. The proposed methodology is applicable to large high-dimensional data sets and is based on computationally efficient algorithms. We illustrate its utility in a range of contexts and data sets.

Highlights

  • Networks and other non Euclidean relational data sets have become important applications in modern statistics

  • One of the most widely studied of these models is the stochastic blockmodel in which there is a greater probability of observing an edge between a pair of nodes if they are in the same block, or community

  • The methodology that we propose in this article allows a binary-valued adjacency matrix to be estimated based on association matrices composed of sample covariances, or correlations, or test statistics from arbitrary known or unknown distributions

Read more

Summary

Introduction

Networks and other non Euclidean relational data sets have become important applications in modern statistics. The methodology that we propose in this article allows a binary-valued adjacency matrix to be estimated based on association matrices composed of sample covariances, or correlations, or test statistics from arbitrary known or unknown distributions This binary-valued adjacency matrix is an ideal summary of the relational data set on which to carry out community detection. The main motivation of this article is to propose methodology to allow continuous-valued statistics which measure the strength of association between pairs of variables to be transformed into a binary-valued adjacency matrix format, for use in community detection In this format, 1s and 0s can be considered to represent variables which are and are not correlated, respectively.

Proposed methodology
Applying the model to test statistics from arbitrary distributions
Model fitting and adjacency matrix inference
Community detection
Examples
Simulation study
Comparison with popular clustering methods
Genomics example
Consumer product review example
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.