SP-BRAIN: scalable and reliable implementations of a supervised relevance-based machine learning algorithm

Valerio Morfino,Salvatore Rampone,Emanuel Weitschek

doi:10.1007/s00500-019-04366-9

Valerio Morfino, Salvatore Rampone + Show 1 more

https://doi.org/10.1007/s00500-019-04366-9

Copy DOI

Abstract

In this work, new implementations of the U-BRAIN (Uncertainty-managing Bach Relevance-Based Artificial Intelligence) supervised machine learning algorithm are described. The implementations, referred as SP-BRAIN (SP stands for Spark), aim to efficiently process large datasets. Given the iterative nature of the algorithm together with its dependence on in-memory data, a non-standard MapReduce paradigm is applied, taking into account several memory and performance problems, e.g., the granularity of the MAP task, the reduction in the shuffling operation, caching, partial data recomputing, and usage of clusters. The implementations benefit the whole Hadoop ecosystem components, such as HDFS, Yarn, and streaming. Testing is performed in cloud execution environments, using different configurations with up to 128 cores. The performance of the new implementations is evaluated on three known datasets, and the findings are compared to the ones of a previous U-BRAIN parallel implementation. The results show a speedup up to 20 × with a good scalability and reliability in cluster environments.

Full Text