Abstract
A graph generator is a tool which allows to create graph-like data whose structural properties are very similar to those found in real world networks. This paper presents two methods to generate graphs with power-law edge distribution based on the MapReduce processing model that can be easily implemented to run on top of Apache Hadoop. The proposed methods allow the generation of directed and undirected power-law distributed graphs without repeated edges. Our experimental evaluation shows that our methods are efficient and scalable in terms of both graph size and cluster capacity.
Highlights
Graphs are a recognized abstraction model as they can be used to represent structured and semi-structured data occurring in many application domains [1]
This paper presents the novel methods H4DG and H4UG to generate power-law distributed graphs based on the MapReduce programming model, so they can be implemented to run in Apache Hadoop
Our methods H4DG and H4UG are inspired by R3MAT [5], so we present a brief R3MAT description
Summary
Graphs are a recognized abstraction model as they can be used to represent structured and semi-structured data occurring in many application domains [1]. According to our review of the literature, several methods and tools have been created to exploit the advantages of parallel and distributed systems, allowing the generation of very large graphs (see Section II-A). This paper presents the novel methods H4DG and H4UG to generate power-law distributed graphs based on the MapReduce programming model, so they can be implemented to run in Apache Hadoop. H4DG generates directed graphs whilst H4UG produces undirected ones They are inspired by R3MAT [5], a method that uses the graph degree distribution as the basis to conduct the generation process. R. Angles et al.: Power-Law Distributed Graph Generation With MapReduce in 41.8 minutes, whereas H4UG produces a similar undirected graph in 40 minutes, both running on a cluster with 32 machines.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have