Abstract

Boosting is an iterative algorithm that combines simple classification rules with mediocre performance in terms of misclassification error rate to produce a highly accurate classification rule. Stochastic gradient boosting provides an enhancement which incorporates a random mechanism at each boosting step showing an improvement in performance and speed in generating the ensemble. ada is an R package that implements three popular variants of boosting, together with a version of stochastic gradient boosting. In addition, useful plots for data analytic purposes are provided along with an extension to the multi-class case. The algorithms are illustrated with synthetic and real data sets.

Highlights

  • Boosting has proved to be an effective method to improve the performance of base classifiers, both theoretically and empirically

  • The following code shows how to predict with Real AdaBoost

  • Remark: The probability class estimate for any boosting algorithm is defined as P(Y = 1 |

Read more

Summary

Introduction

Boosting has proved to be an effective method to improve the performance of base classifiers, both theoretically and empirically. In addition to pharmacology, boosting algorithms have encompassed a wide range of applications including tumor identification and gene expression data [7], proteomics data [24], financial and marketing data [2; 18], fisheries data [17], and microscope imaging data [15] For many of these applications, ada will be useful since it implements well documented tools for assessing variable importance, evaluating training and testing error rates, and viewing pairwise plots of the data. The mboost package has to a large extent similar functionality to the gbm package and in addition implements the general gradient boosting framework using regression based learners In our experience, these packages are more suited for users in need of using boosting in models with a continuous or count type outcome.

A brief account of boosting algorithms
Historical perspective
1: Initialize weights wi
Stochastic boosting
3: Set wi
Connection to bagging
Functional structure
Construction of base learners using rpart
Description of the functions available in the ‘ada’ package
Creating an ‘ada’ object
Using an ‘ada’ object
Testing Results
Diagnostics and model selection
Solubility data
Stochastic boosting in a multi-class context
Summary and concluding remarks
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call