Abstract
Boosting is an iterative algorithm that combines simple classification rules with mediocre performance in terms of misclassification error rate to produce a highly accurate classification rule. Stochastic gradient boosting provides an enhancement which incorporates a random mechanism at each boosting step showing an improvement in performance and speed in generating the ensemble. ada is an R package that implements three popular variants of boosting, together with a version of stochastic gradient boosting. In addition, useful plots for data analytic purposes are provided along with an extension to the multi-class case. The algorithms are illustrated with synthetic and real data sets.
Highlights
Boosting has proved to be an effective method to improve the performance of base classifiers, both theoretically and empirically
The following code shows how to predict with Real AdaBoost
Remark: The probability class estimate for any boosting algorithm is defined as P(Y = 1 |
Summary
Boosting has proved to be an effective method to improve the performance of base classifiers, both theoretically and empirically. In addition to pharmacology, boosting algorithms have encompassed a wide range of applications including tumor identification and gene expression data [7], proteomics data [24], financial and marketing data [2; 18], fisheries data [17], and microscope imaging data [15] For many of these applications, ada will be useful since it implements well documented tools for assessing variable importance, evaluating training and testing error rates, and viewing pairwise plots of the data. The mboost package has to a large extent similar functionality to the gbm package and in addition implements the general gradient boosting framework using regression based learners In our experience, these packages are more suited for users in need of using boosting in models with a continuous or count type outcome.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have