Abstract

Recent studies show that disk-based graph computation systems on just a single PC can be as highly competitive as cluster-based systems on large-scale problems. Inspired by this remarkable progress, we develop VENUS, a disk-based graph computation system which is able to handle billion-scale graphs efficiently on a commodity PC. VENUS adopts a novel computing architecture that features vertex-centric “streamlined” processing—the graph is sequentially loaded and an update function is executed for each vertex in parallel on the fly. VENUS deliberately avoids loading batch edge data by separating read-only structure data from mutable vertex data on disk, and minimizes random IOs by caching vertex data in the main memory whenever possible. The streamlined processing is realized with efficient sequential scan over massive structure data and fast feeding the update function for a large number of vertices. Extensive evaluation on large real-world and synthetic graphs has demonstrated the efficiency of VENUS. For example, to run the PageRank algorithm on a Twitter graph of 42 million vertices and 1.4 billion edges, Spark needs 8.1 minutes with 50 machines and GraphChi spends 13 minutes using high-speed SSD, while VENUS only takes 5 minutes on one machine with an ordinary hard disk.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.