Block size, parallelism and predictive performance: finding the sweet spot in distributed learning

Filipe Oliveira,Davide Carneiro,Miguel Guimarães,Óscar Oliveira,Paulo Novais

doi:10.1080/17445760.2023.2225854

Block size, parallelism and predictive performance: finding the sweet spot in distributed learning

Filipe Oliveira, Davide Carneiro + Show 3 more

https://doi.org/10.1080/17445760.2023.2225854

Copy DOI

Journal: International Journal of Parallel, Emergent and Distributed Systems

Publication Date: Jun 28, 2023

Affiliation: Instituto Superior de Contabilidade e Administracao do Porto, University of Minho

#Block Size #Real-time Delivery + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

As distributed and multi-organization Machine Learning emerges, new challenges must be solved, such as diverse and low-quality data or real-time delivery. In this paper, we use a distributed learning environment to analyze the relationship between block size, parallelism, and predictor quality. Specifically, the goal is to find the optimum block size and the best heuristic to create distributed Ensembles. We evaluated three different heuristics and five block sizes on four publicly available datasets. Results show that using fewer but better base models matches or outperforms a standard Random Forest, and that 32 MB is the best block size.

Full Text