Abstract

We analyze the performance of regular decomposition, a method for compression of large and dense graphs. This method is inspired by Szemerédi’s regularity lemma (SRL), a generic structural result of large and dense graphs. In our method, stochastic block model (SBM) is used as a model in maximum likelihood fitting to find a regular structure similar to the one predicted by SRL. Another ingredient of our method is Rissanen’s minimum description length principle (MDL). We consider scaling of algorithms to extremely large size of graphs by sampling a small subgraph. We continue our previous work on the subject by proving some experimentally found claims. Our theoretical setting does not assume that the graph is generated from a SBM. The task is to find a SBM that is optimal for modeling the given graph in the sense of MDL. This assumption matches with real-life situations when no random generative model is appropriate. Our aim is to show that regular decomposition is a viable and robust method for large graphs emerging, say, in Big Data area.

Highlights

  • In the conference paper [1] we conjectured the possibility of applying our regular decomposition algorithm [2] to very large graphs, for which the full adjacency information is not possible to process, using a sampling approach

  • Our future work will be dedicated to the case of sparse graphs, which is the most important in Big Data

  • Testable graph parameters are nonparametric statistics that can be consistently estimated by appropriate sampling, introduced by László Lovász and coauthors, see [17]

Read more

Summary

Introduction

In the conference paper [1] we conjectured the possibility of applying our regular decomposition algorithm [2] to very large graphs, for which the full adjacency information is not possible to process, using a sampling approach. We prove claims of the preceding paper and give precise conditions under which they are true. This method allows to abandon the customary assumption that the graph be generated by a SBM. Revealing and understanding various relations embedded in such large data sets is of special interest. In mathematical terms, such relations form a huge graph. Our method suggests a way to overcome such hurdles in the case of dense data

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call