Ensemble Pruning via Quadratic Margin Maximization

Waldyn G Martinez

doi:10.1109/access.2021.3062867

Abstract

Ensemble models refer to methods that combine a typically large number of weak learners into a stronger composite model. The output of an ensemble method is the result of fitting a base-learning algorithm to a given data set, and obtaining diverse answers by re-weighting the observations or by re-sampling them using a given probabilistic selection. A key challenge of using ensembles in large-scale multidimensional data lies in the complexity and the computational burden associated with them. The models created by ensembles are often difficult, if not impossible, to interpret and their implementation requires more computational power than individual learning algorithms. Recent research effort in the field has concentrated on reducing ensemble size, while maintaining predictive accuracy. We propose a method to prune an ensemble solution by optimizing its margin distribution, while increasing its diversity. The proposed algorithm results in an ensemble that uses only a fraction of the original weak learners, with generally improved estimated generalization performance. We analyze and test our method on both synthetic and real data sets. The analysis shows that the proposed method compares favorably to the original ensemble solutions and to other existing ensemble pruning methodologies.

Highlights

INTRODUCTIONEnsemble methods combine a large number of fitted values (sometimes in the hundreds) into a composite prediction
Ensemble methods combine a large number of fitted values into a composite prediction
Ensembles generally perform strongly in terms of their generalization ability compared to individual classifiers, the application of ensembles in large scale, high velocity data sets, creates challenges given the more complex nature of these learning algorithms

Summary

INTRODUCTION

Ensemble methods combine a large number of fitted values (sometimes in the hundreds) into a composite prediction. The term boosting refers to a family of methods that combine weak learner (classification algorithms that perform at least slightly better than random) into a strong performing ensemble through weighted voting. Interpretations of ensemble predictions are not as straightforward as those of single learning algorithms and the implementation of the resulting models requires fitting the data through all of the iterations (sometimes in the hundreds) of the ensemble. A high number of iterations is oftentimes necessary to reap the benefits of the improved generalization performance provided by ensembles [5], [7] For this reason, recent research effort has concentrated on reducing ensemble sizes, called ensemble pruning (thinning), while trying to maintain or improve their predictive accuracy (see, e.g., [18]–[27]). In this article we propose an algorithm that produces a reduced, strong-performing sub-ensemble by optimizing the diversity of the weak learners and maximizing its lower margin distribution. The proposed method is a weight-based quadratic optimization formulation that aims to tune the weights of a given ensemble, such that the pairwise correlations of the weak learners and the margin variance are minimized, while the lower percentiles of the margin distribution of the ensemble are maximized

PRELIMINARIES

BOOSTING ALGORITHMS

DIVERSITY AND ENSEMBLE PERFORMANCE

ENSEMBLES UNDER NOISE

SELECTION-BASED METHODS

PROPOSED PRUNING ALGORITHM

EXPERIMENTS AND SIMULATIONS

QMM PERFORMANCE ON BENCHMARK DATA SETS

Findings

CONCLUSIONS

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE access : practical innovations, open solutions	Publication Date: Jan 1, 2021
Citations: 65	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Ensemble Pruning via Quadratic Margin Maximization

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions

Lead the way for us

Similar Papers

Real and synthetic data sets for benchmarking key-value stores focusing on various data types and sizes
Hyuk-Yoon Kwon
Data in Brief | VOL. 30
Hyuk-Yoon KwonHyuk-Yoon Kwon
20 Mar 2020
Data in Brief | VOL. 30

Synthetic Data Generation By Artificial Intelligence to Accelerate Translational Research and Precision Medicine in Hematological Malignancies
Saverio D'Amico ...
Blood | VOL. 140
Saverio D'Amico, et. al.Saverio D'Amico ...
15 Nov 2022
Blood | VOL. 140

Validation of Synthetic U.S. Electric Power Distribution System Data Sets
Venkat Krishnan ... Bruce Bugbee
IEEE Transactions on Smart Grid | VOL. 11
Venkat Krishnan, et. al.Venkat Krishnan ... Bruce Bugbee
01 Sep 2020
IEEE Transactions on Smart Grid | VOL. 11

Detecting experimental noises in protein-protein interactions with iterative sampling and model-based clustering
H Mamitsuka
-
H MamitsukaH Mamitsuka
10 Mar 2003
10 Mar 2003

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Ensemble Pruning via Quadratic Margin Maximization

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions