On Optimizing Ensemble Models using Column Generation

Vanya Aziz,Eligius M T Hendrix,Jan Kronqvist,Ivo Nowak,Ouyang Wu

doi:10.1007/s10957-024-02391-9

Abstract

AbstractIn recent years, an interest appeared in integrating various optimization algorithms in machine learning. We study the potential of ensemble learning in classification tasks and how to efficiently decompose the underlying optimization problem. Ensemble learning has become popular for machine learning applications and it is particularly interesting from an optimization perspective due to its resemblance to column generation. The challenge for learning is not only to obtain a good fit for the training data set, but also good generalization, such that the classifier is generally applicable. Deep networks have the drawback that they require a lot of computational effort to get to an accurate classification. Ensemble learning can combine various weak learners, which individually require less computational time. We consider binary classification problems studying a three-phase algorithm. After initializing a set of base learners refined by a bootstrapping approach, base learners are generated using the solution of an linear programming (LP) master problem and then solving a machine learning sub-problem regarding a reduced data set, which can be viewed as a so-called pricing problem. We theoretically show that the algorithm computes an optimal ensemble model in the convex hull of a given model space. The implementation of the algorithm is part of an ensemble learning framework called decolearn. Numerical experiments with CIFAR-10 data set show that the base learners are diverse and that both the training and generalization error are reduced after several iterations.

Full Text