Abstract
We propose a new supervised learning algorithm for classification and regression problems where two or more preliminary predictors are available. We introduce KernelCobra, a non-linear learning strategy for combining an arbitrary number of initial predictors. KernelCobra builds on the COBRA algorithm introduced by Biau et al. (2016), which combined estimators based on a notion of proximity of predictions on the training data. While the COBRA algorithm used a binary threshold to declare which training data were close and to be used, we generalise this idea by using a kernel to better encapsulate the proximity information. Such a smoothing kernel provides more representative weights to each of the training points which are used to build the aggregate and final predictor, and KernelCobra systematically outperforms the COBRA algorithm. While COBRA is intended for regression, KernelCobra deals with classification and regression. KernelCobra is included as part of the open source Python package Pycobra (0.2.4 and onward), introduced by Srinivasa Desikan (2018). Numerical experiments were undertaken to assess the performance (in terms of pure prediction and computational complexity) of KernelCobra on real-life and synthetic datasets.
Highlights
In the fields of machine learning and statistical learning, ensemble methods consist of combining several estimators to create a new, superior estimator
The KernelCobra algorithm introduced in the present paper aims to smoothen this data point selection process by introducing a kernel-based method to assigning weights to various points in the collective
We focus in the present paper on the introduction of KernelCobra and its variants, and its implementation in Python
Summary
In the fields of machine learning and statistical learning, ensemble methods consist of combining several estimators (or predictors) to create a new, superior estimator. Our method (KernelCobra) extends the COBRA (standing for combined regression alternative) algorithm introduced by Biau et al [6]. The COBRA algorithm is motivated by the idea that non-linear, data-dependent techniques can provide flexibility not offered by existing (linear) ensemble methods. By using information of proximity between the training data and predictions on test data, training points are collected to perform the aggregate. The COBRA algorithm selects training points by checking whether the proximity is less than a data dependant threshold e, resulting in a binary decision (either keep the point or discard it). The KernelCobra algorithm introduced in the present paper aims to smoothen this data point selection process by introducing a kernel-based method to assigning weights to various points in the collective.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.