Abstract

As an important branch of big data processing, big graph processing is becoming increasingly popular in recent years. Strongly connected component (SCC) computation is a fundamental graph operation on directed graphs, where an SCC is a maximal subgraph S of a directed graph G in which every pair of nodes is reachable from each other in S. By contracting each SCC into a node, a large general directed graph can be represented by a small directed acyclic graph (DAG). In the literature, there are I/O efficient semi-external algorithms to compute all SCCs of a graph G, by assuming that all nodes of a graph G can fit in the main memory. However, many real graphs are large and even the nodes cannot reside entirely in the main memory. In this paper, we study new I/O efficient external algorithms to find all SCCs for a directed graph G whose nodes cannot fit entirely in the main memory. To overcome the deficiency of the existing external graph contraction based approach that usually cannot stop in finite iterations, and the external DFS based approach that will generate a large number of random I/Os, we explore a new contraction-expansion based approach. In the graph contraction phase, instead of contracting the whole graph as the contraction based approach, we only contract the nodes of a graph, which are much more selective. The contraction phase stops when all nodes of the graph can fit in the main memory, such that the semi-external algorithm can be used in SCC computation. In the graph expansion phase, as the graph is expanded in the reverse order as it is contracted, the SCCs of all nodes in the graph are computed. Both graph contraction phase and graph expansion phase use only I/O efficient sequential scans and external sorts of nodes/edges in the graph. Our algorithm leverages the efficiency of the semi-external SCC computation algorithm and usually stops in a small number of iterations. We further optimize our approach by reducing the size of nodes and edges of the contracted graph in each iteration. We conduct extensive experimental studies using both real and synthetic web-scale graphs to confirm the I/O efficiency of our approaches.

Highlights

  • Graph is an important data structure to model complex relationships among entities

  • We study the problem of strongly connected component (SCC) computation, which is a fundamental graph operation on directed graphs

  • Computing SCCs on large graphs is highly demanded by many real applications that need topological sort, reachability query processing, and graph pattern matching in graph processing

Read more

Summary

INTRODUCTION

Graph is an important data structure to model complex relationships among entities. A road network, a social network, and the entire WWW can be modelled as graphs, and all such graphs are huge. In [16], Hellings et al propose an efficient algorithm for external bisimulation on graphs, where all nodes are assumed to be in the reverse topological order and stored on disk This needs to find all SCCs in a preprocessing step. +sort(|E|)) by maintaining the nodes that should not be traversed using tournament trees [17] and buffered repository trees [8] respectively, where B is the disk block size Despite their theoretical guarantees, these algorithms are considered impractical for general directed graphs that encountered in real applications, due to the large number of random I/Os generated. We stop when all nodes can fit in the main memory and process the contracted graph using an I/O efficient semi-external algorithm.

PROBLEM DEFINITION
A NEW CONTRACTION-EXPANSION APPROACH
GRAPH CONTRACTION
GRAPH EXPANSION
VIII. PERFORMANCE STUDIES
RELATED WORK
Findings
CONCLUSIONS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call