Abstract

Nowadays, many applications that involve big data can be modelled as graphs. In many cases these graphs may be too large to be loaded and processed on a single commodity computer. This necessitated the development of frameworks that allow processing large graphs by distributing the graph among nodes in a cluster. To process a graph using these frameworks, first, the graph is partitioned into smaller components called subgraphs or partitions, and then assign these smaller subgraphs to different nodes for parallel processing. Depending on the type of processing (example, computing pagerank, counting number of triangles etc.), there will be some communication between nodes during the execution, this communication affects execution time. Therefore, graph partitioning is an important step in distributed graph processing. Being able to determine the quality of a partition prior to processing is important as this will allow us to predict the execution time before the actual processing. A number of metrics for evaluating the quality of a graph partitions exist, but studies show that these metrics may not serve as accurate predictors in many cases. In this work, we reviewed published papers about graph partitioning and we were able to identify and defined more metrics in order to have a catalogue of these metrics.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call