Towards optimizing the execution of spark scientific workflows using machine learning‐based parameter tuning

Douglas De Oliveira,Cristina Boeres,Daniel De Oliveira,Fábio Porto

doi:10.1002/cpe.5972

Douglas De Oliveira, Cristina Boeres + Show 2 more

Open Access

https://doi.org/10.1002/cpe.5972

Copy DOI

Abstract

SummaryIn the last few years, Apache Spark has become a de facto the standard framework for big data systems on both industry and academy projects. Spark is used to execute compute‐ and data‐intensive workflows in distinct areas like biology and astronomy. Although Spark is an easy‐to‐install framework, it has more than one hundred parameters to be set, besides domain‐specific parameters of each workflow. In this way, to execute Spark‐based workflows efficiently, the user has to fine‐tune a myriad of Spark and workflow parameters (eg, partitioning strategy, the average size of a DNA sequence, etc.). This configuration task cannot be manually performed in a trial‐and‐error manner since it is tedious and error‐prone. This article proposes an approach that focuses on generating interpretable predictive machine learning models (ie, decision trees), and then extract useful rules (ie, patterns) from these models that can be applied to configure parameters of future executions of the workflow and Spark for nonexperts users. In the experiments presented in this article, the proposed parameter configuration approach led to better performance in processing Spark workflows. Finally, the approach introduced here reduced the number of parameters to be configured by identifying the most relevant domain‐specific ones related to the workflow performance in the predictive model.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Concurrency and computation : practice & experience	Publication Date: Sep 5, 2020
Citations: 9	License type: publisher-specific, author manuscript: http://onlinelibrary.wiley.com/termsAndConditions#am

R Discovery Prime

R Discovery Prime

Towards optimizing the execution of spark scientific workflows using machine learning‐based parameter tuning

Abstract

Talk to us

Similar Papers

More From: Concurrency and computation : practice & experience

Lead the way for us

Similar Papers

Hybrid interpretable predictive machine learning model for air pollution prediction
Yuanlin Gu ... Qinggang Meng
Neurocomputing | VOL. 468
Yuanlin Gu, et. al.Yuanlin Gu ... Qinggang Meng
06 Oct 2021
Neurocomputing | VOL. 468

Interpretable machine learning to predict adverse perinatal outcomes: examining marginal predictive value of risk factors during pregnancy
Sun Ju Lee ... Sheree L Boulet
American Journal of Obstetrics & Gynecology MFM | VOL. 5
Sun Ju Lee, et. al.Sun Ju Lee ... Sheree L Boulet
15 Jul 2023
American Journal of Obstetrics & Gynecology MFM | VOL. 5

Interpretable Machine Learning in Healthcare
Muhammad Aurangzeb Ahmad ... Ankur Teredesai
-
Muhammad Aurangzeb Ahmad, et. al.Muhammad Aurangzeb Ahmad ... Ankur Teredesai
15 Aug 2018
15 Aug 2018

Interpretable Machine Learning in Healthcare
Muhammad Aurangzeb Ahmad ... Carly Eckert
-
Muhammad Aurangzeb Ahmad, et. al.Muhammad Aurangzeb Ahmad ... Carly Eckert
01 Jun 2018
01 Jun 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Towards optimizing the execution of spark scientific workflows using machine learning‐based parameter tuning

Abstract

Talk to us

Similar Papers

More From: Concurrency and computation : practice & experience