Abstract

Real datasets always play an essential role in graph mining and analysis. However, nowadays most available real datasets only support millions of nodes. Therefore, the literature on Big Data analysis utilizes statistical graph generators to generate a massive graph (e.g., billions of nodes) for evaluating the scalability of an algorithm. Nevertheless, current popular statistical graph generators are properly designed to preserve only the statistical metrics, such as the degree distribution, diameter, and clustering coefficient of the original social graphs. Recently, the importance of frequent graph patterns has been recognized in the various works on graph mining, but unfortunately this crucial criterion has not been noticed in the existing graph generators. To address this important need, we make the first attempt to design a Pattern Preserving Graph Generation (PPGG) algorithm to generate a graph including all frequent patterns and three most popular statistical parameters: degree distribution, clustering coefficient, and average vertex degree. The experimental results show that PPGG, which we have released as a free download, is efficient and able to generate a billion-node graph in approximately 10 minutes, much faster than the existing graph generators.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call