Abstract

TuX2 is a new distributed graph engine that bridges graph computation and distributed machine learning. TuX2 inherits the benefits of elegant graph computation model, efficient graph layout, and balanced parallelism to scale to billion-edge graphs, while extended and optimized for distributed machine learning to support heterogeneity in data model, Stale Synchronous Parallel in scheduling, and a new Mini-batch, Exchange, GlobalSync, and Apply ( MEGA ) model for programming. TuX2 further introduces a hybrid vertex-cut graph optimization and supports various consistency models in fault tolerance for machine learning. We have developed a set of representative distributed machine learning algorithms in TuX2 , covering both supervised and unsupervised learning. Compared to the implementations on distributed machine learning platforms, writing those algorithms in TuX2 takes only about 25 percent of the code: our graph computation model hides the detailed management of data layout, partitioning, and parallelism from developers. The extensive evaluation of TuX2 , using large datasets with up to 64 billion of edges, shows that TuX2 outperforms PowerGraph/PowerLyra, the state-of-the-art distributed graph engines, by an order of magnitude, while beating two state-of-the-art distributed machine learning systems by at least 60 percent.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call