Abstract

Computing connected components is a core operation on graph data. Since billion-scale graphs cannot be resident in memory of a single server, several approaches based on distributed machines have recently been proposed. The representative methods are $\mathsf{Hash\hbox{-}To\hbox{-}Min}$ and $\mathsf{PowerGraph}$ . $\mathsf{Hash\hbox{-}To\hbox{-}Min}$ is the state-of-the art disk-based distributed method which minimizes the number of MapReduce rounds. $\mathsf{PowerGraph}$ is the-state-of-the-art in-memory distributed system, which is typically faster than the disk-based distributed one, however, requires a lot of machines for handling billion-scale graphs. In this paper, we propose an I/O efficient parallel algorithm for billion-scale graphs in a single PC. We first propose the Disk-based Sequential access-oriented Parallel processing (DSP) model that exploits sequential disk access in terms of disk I/Os and parallel processing in terms of computation. We then propose an ultra-fast disk-based parallel algorithm for computing connected components, $\mathsf{DSP\hbox{-}CC}$ , which largely improves the performance through sequential disk scan and page-level cache-conscious parallel processing . Extensive experimental results show that $\mathsf{DSP\hbox{-}CC}$ 1) computes connected components in billion-scale graphs using the limited memory size whereas in-memory algorithms can only support medium-sized graphs with the same memory size, and 2) significantly outperforms all distributed competitors as well as a representative disk-based parallel method.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.