Abstract
Big data applications increasingly rely on the analysis of large graphs. In recent years, a number of out-of-core graph processing systems have been proposed to process graphs with billions of edges on just one commodity computer, by efficiently using the secondary storage (e.g., hard disk, SSD). Unfortunately, these graph processing systems continue to suffer from poor performance, despite of many solutions proposed to address the disk I/O bottleneck problem, a commonly recognized root cause. However, our experimental results show that another root cause of the poor performance is the subgraph construction phase of graph processing, which induces a large number of random memory accesses that substantially weaken cache access locality and thus greatly degrade the performance. In this paper, we propose an efficient out-of-core graph processing system, LOSC, to substantially reduce the overheads of subgraph construction. LOSC proposes a locality-optimized subgraph construction scheme that significantly improves the in-memory data access locality of the subgraph construction phase. Furthermore, LOSC adopts a compact edge storage format and a lightweight replication of vertices to reduce I/O traffic and improve computation efficiency. Extensive evaluation results show that LOSC is respectively 9.4× and 5.1× faster than GraphChi and GridGraph, two representative out-of-core systems. In addition, LOSC outperforms other state-of-art out-of-core graph processing systems including FlashGraph, GraphZ, G-Store and NXGraph. For example, LOSC can be up to 6.9× faster than FlashGraph.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have