Abstract

Abstract The purpose of the ensemble methods is to increase the accuracy of predictionthrough combining many classi ers. According to recent studies, it is proved thatrandom forests and forward stagewise regression have good accuracies in classi cationproblems. However they have great prediction error in separation boundary pointsbecause they used decision tree as a base learner. In this study, we use the kernel ridgeregression instead of the decision trees in random forests and boosting. The usefulnessof our proposed ensemble methods was shown by the simulation results of the prostatecancer and the Boston housing data.Keywords: Boosting, ensemble method, forward stagewise regression, kernel ridge re-gression, random forest. 1. Introduction Ensemble methods are learning algorithms using a set of classi ers to predict a new data’sresponse value by taking a (weighted) vote or averaging of their predictions in the set ofclassi ers. The rst ensemble method is the one using Bayesian averaging. Also there areanother three famous ensemble methods such as bagging (Breiman, 1996), boosting (Freundand Schapire, 1997) and random forests (RF) (Breiman, 2001), which are based on thedecision trees. Especially, boosting and random forests are well known to be excellent in theaccuracy of prediction in classi cation problems.Bagging is a technique reducing the variance of an estimated prediction function and seemsto work well expecially for high-variance and low-bias procedures, such as in the decisiontree. For regression, we simply t the same regression trees many times to bootstrap sampledversions of the training data, to average the results. For classi cation, we cast a vote for thepredicted class in a set of trees.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call