We study the problem of approximately counting cliques and near cliques in a graph, where the access to the graph is only available through crawling its vertices. This model has been introduced recently to capture real-life scenarios in which the entire graph is too massive to be stored as a whole or be scanned entirely. Sampling vertices independently is non-trivial in this model, thus algorithms which rely on sampling often use a random walk. The goal is to provide an accurate estimate by seeing only a small portion of the graph. This model is known as the random walk model or the neighborhood query model. We introduce D e ME t RIS: Dense Motif Estimation through Random Incident Sampling. This method provides a scalable algorithm for clique and near clique counting in the random walk model. We prove the correctness of our algorithm through rigorous mathematical analysis and extensive experiments. Both our theoretical results and our experiments show that D e ME t RIS obtains a high precision estimation by only crawling a sub-linear portion on vertices. Therefore, we demonstrate a significant improvement over previous known results.
Read full abstract