Widening: using parallel resources to improve model quality

Michael R Berthold,Alexander Fillbrunn,Arno Siebes

doi:10.1007/s10618-021-00749-5

Abstract

This paper provides a unified description of Widening, a framework for the use of parallel (or otherwise abundant) computational resources to improve model quality. We discuss different theoretical approaches to Widening with and without consideration of diversity. We then soften some of the underlying constraints so that Widening can be implemented in real world algorithms. We summarize earlier experimental results demonstrating the potential impact as well as promising implementation strategies before concluding with a survey of related work.

Highlights

In particular we make the distinction between explicit partitions of the model space and how partitions can be closed under refinement or just weakly closed when the selection operator has been applied as well
Afterwards we introduced the notion of path-based Widening which relies on the selection operator to implicitly segment the model space
We discuss an aggregate of earlier results, highlighting potential pitfalls and providing an intuition for different approaches to realize Widening in practice

Summary

Motivation and introduction

The trend to add more cores to modern processors and the growing popularity of cloud based compute resources has increased the importance of parallel algorithm development. Instead our goal is to improve model quality without increasing the overall time spent by investing parallel resources into better exploration of the (model) search space These types of search problems are widespread in machine learning and data mining with models relying on numerical parameters that need optimization, discrete models turning this into a combinatorial search problem, and sometimes a hybrid of both. We provide a formalization of Widening combining, expanding, and unifying earlier publications (Akbar et al 2012; Ivanova and Berthold 2013) that describe a number of ideal methods for Widening of this type of search and reducing the impact of the greedy heuristic These choices differ in how they widen the search with various partitioning methods.

Preliminaries

Selection and refinement

Widening

Top-k widening

Ideal partitioning for widening

Approximate partitioning for widening

Path-based widening

Diversity-driven widening

Ideal diversity-driven widening

Randomized diversity-driven widening

Summary

Practical considerations and experimental insights

Injecting diversity

Explicit diversity: diverse top-k

Explicit diversity: data- versus model-driven diverse top-k

Implicit diversity

Implicit diversity: hashed bucket selector

Experimental insights

Runtime observations

Lessons learned

Related work

Machine learning algorithm parallelization

Speed-up through parallelization

Specific frameworks

Model quality improvement

Look ahead strategies

Ensemble learning

Meta heuristics

Monte Carlo tree search

Federated learning

Greedy search algorithm improvement

Parallel local search

Communication reduction

Findings

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Widening: using parallel resources to improve model quality

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Data Mining and Knowledge Discovery

Lead the way for us

Journal: Data Mining and Knowledge Discovery	Publication Date: Apr 9, 2021
License type: open-access

Similar Papers

Specific strategies for successful lean product development implementation
Uwe Dombrowski ... David Ebentreich
International Journal of Lean Enterprise Research | VOL. 1
Uwe Dombrowski, et. al.Uwe Dombrowski ... David Ebentreich
01 Jan 2014
International Journal of Lean Enterprise Research | VOL. 1

Dissemination and implementation of evidence–based practices: Training and consultation as implementation strategies.
Julie M Edmunds ... Rinad S Beidas
Clinical Psychology: Science and Practice | VOL. 20
Julie M Edmunds, et. al.Julie M Edmunds ... Rinad S Beidas
01 Jun 2013
Clinical Psychology: Science and Practice | VOL. 20

Costs of an Alcohol Measurement Intervention in Three Latin American Countries.
Adriana Solovei ... Hein De Vries
International journal of environmental research and public health | VOL. 19
Adriana Solovei, et. al.Adriana Solovei ... Hein De Vries
08 Jan 2022
International journal of environmental research and public health | VOL. 19

Towards an alternative implementation of NXT's query language via XQuery
Neil Mayo ... Jean Carletta
-
Neil Mayo, et. al.Neil Mayo ... Jean Carletta
01 Jan 2006
01 Jan 2006

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Widening: using parallel resources to improve model quality

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Data Mining and Knowledge Discovery