Automated Deployment of a Spark Cluster with Machine Learning Algorithm Integration

A.M Fernández,D Gutiérrez-Avilés,F Martínez–Álvarez,A Troncoso

doi:10.1016/j.bdr.2020.100135

A.M Fernández, D Gutiérrez-Avilés + Show 2 more

Open Access

https://doi.org/10.1016/j.bdr.2020.100135

Copy DOI

Journal: Big Data Research	Publication Date: Mar 1, 2020
Citations: 5	License type: cc-by-nc-nd

Affiliation: Universidad Pablo de Olavide

Abstract

Abstract The vast amount of data stored nowadays has turned big data analytics into a very trendy research field. The Spark distributed computing platform has emerged as a dominant and widely used paradigm for cluster deployment and big data analytics. However, to get started up is still a task that may take much time when manually done, due to the requisites that all nodes must fulfill. This work introduces LadonSpark, an open-source and non-commercial solution to configure and deploy a Spark cluster automatically. It has been specially designed for easy and efficient management of a Spark cluster with a friendly graphical user interface to automate the deployment of a cluster and to start up the distributed file system of Hadoop quickly. Moreover, LadonSpark includes the functionality of integrating any algorithm into the system. That is, the user only needs to provide the executable file and the number of required inputs for proper parametrization. Source codes developed in Scala, R, Python, or Java can be supported on LadonSpark. Besides, clustering, regression, classification, and association rules algorithms are already integrated so that users can test its usability from its initial installation.

Full Text