Abstract

Application of boosting to both two-class and multi-class classification problems are studied. Five real chemical data sets are used. Each data is randomly divided into two subsets, one for training and the other for prediction. For two-class classification, each data is separated into a high response level class and a low response level class according to a threshold value. As a result, three data sets, wheat data, cream data and HIV data, show that boosting using classification and regression trees (CART) as a base learner may decrease the misclassification rate in prediction with respect to using a single CART. However, boosting for green tea data indicates that overfitting may occur when boosting is applied. For the chromatographic retention data, boosting performs worse than a single CART. The cream data and the HIV data are also used for multi-class classification. Both data sets demonstrate that boosting performs better than CART in multi-classification. Variable importance analysis suggests that the improvement made by boosting may be due to the use of more variables, which give more information on special types of samples in the training data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.