Abstract
This position paper provides an overview of our recent advances in the study of big graphs, from theory to systems to applications. We introduce a theory of bounded evaluability, to query big graphs by accessing a bounded amount of the data. Based on this, we propose a framework to query big graphs with constrained resources. Beyond queries, we propose functional dependencies for graphs, to detect inconsistencies in knowledge bases and catch spams in social networks. As an example application of big graph analyses, we extend association rules from itemsets to graphs for social media marketing. We also identify open problems in connection with querying, cleaning and mining big graphs.
Highlights
The study of graphs has generated renewed interest in the past decade
DFS takes O(|G|) time, not to mention graph pattern matching via subgraph isomorphism, for which it is NP-complete to decide whether Q(G) is empty, i.e., whether there exists a match of pattern Q in G
We show that GFDs can be used as data quality rules and are capable of catching inconsistencies commonly found in knowledge bases, as violations of the GFDs
Summary
The study of graphs has generated renewed interest in the past decade. Graphs make an important source of big data and have found prevalent use in, e.g., social media marketing, knowledge discovery, transportation networks, mobile network analysis, computer vision, the study of adolescent drug use [93], and intelligence analysis for. That is, when G grows big, we add more processors and parallelize the computation of Q(G), to make the computation scale with G Based on this assumption, several parallel graph query systems have been developed, e.g., Pregel [52], GraphLab. Even for queries that are parallel scalable, small businesses often have constrained resources such as limited budget and available processors and cannot afford renting thousands of Amazon EC2 instances. With these observations come the following questions. Is it possible to efficiently compute Q(G) when G is big and Q is expensive, and when we have constrained resources? We introduce a resource-constrained framework to query big graphs
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have