Abstract

This position paper provides an overview of our recent advances in the study of big graphs, from theory to systems to applications. We introduce a theory of bounded evaluability, to query big graphs by accessing a bounded amount of the data. Based on this, we propose a framework to query big graphs with constrained resources. Beyond queries, we propose functional dependencies for graphs, to detect inconsistencies in knowledge bases and catch spams in social networks. As an example application of big graph analyses, we extend association rules from itemsets to graphs for social media marketing. We also identify open problems in connection with querying, cleaning and mining big graphs.

Highlights

  • The study of graphs has generated renewed interest in the past decade

  • DFS takes O(|G|) time, not to mention graph pattern matching via subgraph isomorphism, for which it is NP-complete to decide whether Q(G) is empty, i.e., whether there exists a match of pattern Q in G

  • We show that GFDs can be used as data quality rules and are capable of catching inconsistencies commonly found in knowledge bases, as violations of the GFDs

Read more

Summary

Introduction

The study of graphs has generated renewed interest in the past decade. Graphs make an important source of big data and have found prevalent use in, e.g., social media marketing, knowledge discovery, transportation networks, mobile network analysis, computer vision, the study of adolescent drug use [93], and intelligence analysis for. That is, when G grows big, we add more processors and parallelize the computation of Q(G), to make the computation scale with G Based on this assumption, several parallel graph query systems have been developed, e.g., Pregel [52], GraphLab. Even for queries that are parallel scalable, small businesses often have constrained resources such as limited budget and available processors and cannot afford renting thousands of Amazon EC2 instances. With these observations come the following questions. Is it possible to efficiently compute Q(G) when G is big and Q is expensive, and when we have constrained resources? We introduce a resource-constrained framework to query big graphs

Catching Inconsistencies
Identifying Associations
Organization
Graphs
Graph Pattern Matching
Querying Big Graphs
Bounded Evaluability
Bounded Evaluation
Related Work
A Resource-Constrained Framework
A Framework to Query Big Graphs
Dependencies for Graphs
GFDs: Graph Functional Dependencies
Semantics
Satisfiability
Reasoning about GFDs
Implication
Complexity
Putting GFDs in Actions
Validation Analysis
Parallel Scalable Algorithms
Association Rules for Graphs
GPARs: Graph Pattern Association Rules
Adding Counting Quantifiers
Quantified Graph Patterns
Quantified pattern matching
Relate Work
Discovering and Applying GPARs
Discovering GPARs
Identifying Potential Customers
Conclusion
Discovering Access Schema
Accuracy Guarantee
Parallel Scalability
Discovering GFDs
Repairing Graph-structured Data
Findings
Big Graph Mining
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call