Data dependencies are a key concept in data management and have been researched in data integration, data quality and query optimization. With the increasing use of graph-structured data in diverse applications, there is also an increasing interest in the study of graph data dependencies. In this scenario, different classes of graph data dependencies have been proposed in the literature. In this work, we study the class of Graph Generating Dependencies (GGDs). Graph Generating Dependencies (GGDs) informally express constraints between two (possibly different) graph patterns which enforce relationships on both graph's data (via property value constraints) and its structure (via topological constraints). While most of previously proposed classes of graph data dependencies focus on generalizing equality-generating dependencies for graph data, Graph Generating Dependencies (GGDs) can express tuple- and equality-generating dependencies on property graphs, both of which find broad application in graph data management. Given this new class of dependency, in this paper, we discuss the reasoning behind GGDs on Property Graphs. We propose algorithms to solve three main reasoning problems: the satisfiability, implication, and validation problems for GGDs and analyze their complexity. By studying these problems, we can understand the expressiveness and the limitations of GGDs in practical applications. To demonstrate the practical use of GGDs, we propose an algorithm that finds inconsistencies in data through validation of GGDs. Our experiments show that even though the validation of GGDs has high computational complexity, GGDs can be used to find data inconsistencies in a feasible execution time on both synthetic and real-world data.
Read full abstract