Abstract

We propose a new method of graph partitioning for big graphs that include a conceptual schema. The conceptual schema of a graph database, called a schema graph, is defined implicitly as part of the graph database itself. A graph database is stored in a distributed triple-store, i.e., a distributed database system for managing graph edges represented by triples. We define the statistics of a graph database on the basis of the schema graph. The statistics are gathered for all schema triples, i.e., the types of graph edges. The space of the schema triples is partially ordered by the is-more-general relationship that is defined through the class and predicate taxonomies. The graph partitioning method has two phases. A skeleton graph of the triple-store is computed in the first phase. The skeleton graph is composed of the set of schema triples that have the extensions of an appropriate size to serve as the fragments of the distribution. The edges of the skeleton graph are selected in a top-down manner, i.e., from the most general schema triple to more specific schema triples. The edges of the skeleton graph are clustered into n partitions in the second phase of the algorithm. The function distance that is used in the clustering algorithm is based on the statistics of the schema triples. The graph partitioning function maps each schema triple from the skeleton graph to its partition, stored on a separate data server. The partitioning function is well defined in that it maps the types of the triple-patterns to k fragments such that k corresponds to the size of the portions of the triple-store addressed by the triple-patterns. In other words, it maps the types of triple-patterns that address a large number of triples to multiple distributed fragments, and the types of triple-patterns that address few triples to a single fragment.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call