Queryable compression on streaming social networks

Sridhar Radhakrishnan,Michael Nelson,Chandra N. Sekharan,Amlan Chatterjee

doi:10.1109/bigdata.2017.8258020

Abstract

In this era of social networks, we find ourselves with a collection of massive, changing graphs. Each of these graphs contain a set of nodes (individuals) and a set of edges among the nodes (relationships). How a graph is represented in a data structure determines what information is easy to obtain from it. However, many graphs are so large that even basic data structure representations (e.g. adjacency lists) do not fit in main memory. Therefore, it is an interesting field of study to design compressed data structures that facilitate certain query functions. Since we are dealing with social networks, our structure will also be able to stream edges directly into the compressed graph. We introduce our social network compressed data structure as an indexed array of compressed binary trees. We further minimize memory overhead by directly constructing the graph without any intermediate structure. We also provide fast access methods for edge existence (does an edge exist between two nodes?), neighbor queries (list a node's neighbors), and streaming operations (add/remove nodes/edges). We test our algorithms on public, anonymized, massive graphs such as Friendster, Live-Journal, Pokec, Twitter, and others. Our empirical evaluation is based on several parameters including time to compress, memory required by the compression algorithm, size of compressed graph, and time to execute queries.

Full Text