Graphs are widely used to store complex data nowadays: social networks, recommendation engines, computer networks and bioinformatics, to name a few. With a rapidly growing amount of data on the Internet in recent years, designing scalable systems to process the huge graph data efficiently has become a critical issue. In order to store and process the graph data efficiently in memory/disk, as well as to save time for transferring the data, graph compression techniques are often used. However, most of the existing graph data compression approaches are syntactic, which means they focus on graph structure and reduce it by serialization or redundancy removal. In this paper we focus on a semantic approach, namely query-based graph data reduction, which reduces a graph by preserving only the information relevant to the queries needed by an application. We study several classical graph problems and their applications, and design a suite of graph reduction algorithms to generate reduced graphs in which an application can still compute the same solutions. In addition, we design a synthesis method that can combine existing graph reduction algorithms to generate a reduced graph for a complex graph problem that includes more than one constraint. We also discuss incremental maintenance in order to update a reduced graph without reprocessing the whole graph again when the original graph is modified. We conduct experiments to compare the reduction rate of our algorithms with different sizes and types of data.
Read full abstract