Abstract

Distributed machine learning (ML) has triggered tremendous research interest in recent years. Stochastic gradient descent (SGD) is one of the most popular algorithms for training ML models, and has been implemented in almost all distributed ML systems, such as Spark MLlib, Petuum, MXNet, and TensorFlow. However, current implementations often incur huge communication and memory overheads when it comes to large models. One important reason for this inefficiency is the row-oriented scheme (RowSGD) that existing systems use to partition the training data, which forces them to adopt a centralized model management strategy that leads to vast amount of data exchange over the network. We propose a novel, column-oriented scheme (ColumnSGD) that partitions training data by columns rather than by rows. As a result, ML model can be partitioned by columns as well, leading to a distributed configuration where individual data and model partitions can be collocated on the same machine. Following this locality property, we develop a simple yet powerful computation framework that significantly reduces communication overheads and memory footprints compared to RowSGD, for large-scale ML models such as generalized linear models (GLMs) and factorization machines (FMs). We implement ColumnSGD on top of Apache Spark, and study its performance both analytically and experimentally. Experimental results on both public and real-world datasets show that ColumnSGD is up to 930× faster than MLlib, 63× faster than Petuum, and 14× faster than MXNet.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call