JAMPI: Efficient Matrix Multiplication in Spark Using Barrier Execution Mode

Tamas Foldi,Chris Von Csefalvay,Nicolas A Perez

doi:10.3390/bdcc4040032

Tamas Foldi, Chris Von Csefalvay + Show 1 more

Open Access

https://doi.org/10.3390/bdcc4040032

Copy DOI

Journal: Big Data and Cognitive Computing	Publication Date: Nov 5, 2020
Citations: 5	License type: CC BY 4.0

Affiliation: Google (United States)

Abstract

The new barrier mode in Apache Spark allows for embedding distributed deep learning training as a Spark stage to simplify the distributed training workflow. In Spark, a task in a stage does not depend on any other tasks in the same stage, and hence it can be scheduled independently. However, several algorithms require more sophisticated inter-task communications, similar to the MPI paradigm. By combining distributed message passing (using asynchronous network IO), OpenJDK’s new auto-vectorization and Spark’s barrier execution mode, we can add non-map/reduce-based algorithms, such as Cannon’s distributed matrix multiplication to Spark. We document an efficient distributed matrix multiplication using Cannon’s algorithm, which significantly improves on the performance of the existing MLlib implementation. Used within a barrier task, the algorithm described herein results in an up to 24% performance increase on a 10,000 × 10,000 square matrix with a significantly lower memory footprint. Applications of efficient matrix multiplication include, among others, accelerating the training and implementation of deep convolutional neural network-based workloads, and thus such efficient algorithms can play a ground-breaking role in the faster and more efficient execution of even the most complicated machine learning tasks.

Highlights

The past decade has seen the emergence of two immensely powerful processes in tandem: the rise of big data handling solutions, such as Apache Spark on one hand, and the apotheosis of deep learning as the tool of choice for demanding computational solutions for machine learning problems on the other hand
Comparative analysis of runtimes over a range of matrix sizes reveals that JAMPI is significantly superior to MLlib, even when over-partitioned
When normalized against JAMPI’s execution times over 16 and 64 cores, execution time is slower for smaller matrices due to the need to establish and run the barrier execution task

Summary

Introduction

The past decade has seen the emergence of two immensely powerful processes in tandem: the rise of big data handling solutions, such as Apache Spark on one hand, and the apotheosis of deep learning as the tool of choice for demanding computational solutions for machine learning problems on the other hand. The big data paradigm, primarily designed around RDDs and the the DataFrame-based API This outlook has dominated the development of Apache Spark. The future of deep learning over big data depends greatly on facilitating the convergence of these two worlds into a single, unified paradigm: the use of well-designed big data management tools, such as Apache Spark, to interoperate with the demands of deep learning. The road towards this convergence depends on the development of efficient matrix primitives that facilitate rapid calculations over distributed networks and large data sets

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

JAMPI: Efficient Matrix Multiplication in Spark Using Barrier Execution Mode

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Big Data and Cognitive Computing

Lead the way for us

Similar Papers

Stark: Fast and Scalable Strassen’s Matrix Multiplication Using Apache Spark
Chandan Misra ... Sourangshu Bhattacharya
IEEE Transactions on Big Data | VOL. 8
Chandan Misra, et. al.Chandan Misra ... Sourangshu Bhattacharya
01 Jun 2022
IEEE Transactions on Big Data | VOL. 8

Distributed Matrix Multiplication Performance Estimator for Machine Learning Jobs in Cloud Computing
Myungjun Son ... Kyungyong Lee
-
Myungjun Son, et. al.Myungjun Son ... Kyungyong Lee
01 Jul 2018
01 Jul 2018

Applications of Matrix Multiplication
Hayatullah Saeed ... Mohammad Azim Nazari
Journal for Research in Applied Sciences and Biotechnology | VOL. 3
Hayatullah Saeed, et. al.Hayatullah Saeed ... Mohammad Azim Nazari
02 Jun 2024
Journal for Research in Applied Sciences and Biotechnology | VOL. 3

Random Sampling for Distributed Coded Matrix Multiplication
Wei-Ting Chang ... Ravi Tandon
-
Wei-Ting Chang, et. al.Wei-Ting Chang ... Ravi Tandon
01 May 2019
01 May 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

JAMPI: Efficient Matrix Multiplication in Spark Using Barrier Execution Mode

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Big Data and Cognitive Computing