Abstract

In many online classification tasks or non-exhaustive learning, it is often impossible to define a training set with a complete set of classes. The presence of new classes as well as the novelties caused by data errors can severely affect the performance of classifiers. Traditional proximity-based approaches usually utilize the distance to measure the proximity of different samples. In this study, we propose a framework that uses ensemble learning to detect novelty based on Random Forest (RF). The proposed framework is based on the observation that an ensemble of classifiers can provide a kind of metric to characterize different classes and measure their proximity. In particular, we apply ensemble methods with the decision tree as base classifiers and present two specific approaches, RFV and RFP, based on random forest. RFV uses the vote distribution of RF on a testing sample, and RFP takes the proximity matrix of RF as a special kernel metric to discover the novelty. The proposed approaches are compared against two common approaches: support vector domain description (SVDD) and Gaussian Mixed Model (GMM) on one artificial data set and five benchmark data sets. The experimental results show that the proposed methods achieve better performance in terms of accuracy and recall.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.