Abstract

We propose a novel practical method for finding the optimal classifier parameter status corresponding to the Bayes error (minimum classification error probability) through the evaluation of estimated class boundaries from the perspective of Bayes boundary-ness. While traditional methods approach classifier optimality from the angle of minimization of the estimated classification error probabilities, we approach it from the angle of optimality of the estimated classification boundaries. The optimal classification boundary consists solely of uncertain samples, whose posterior probability is equal for the two classes separated by the boundary. We refer to this essential characteristic of the boundary as “Bayes boundary-ness”, and use it to measure how optimal the estimated boundary is. Our proposed method achieves the optimal parameter status using the training data only once, in contrast to such traditional methods as Cross-Validation (CV), which demand separate validation data and often require a number of repetitions of training and validation. Moreover, it can be directly applied to any type of classifier, and potentially to any type of sample. In this paper, we first elaborate on our proposed method that implements the Bayes boundary-ness with an entropy-based uncertainty measure. Next, we analyze the mathematical characteristics of the uncertainty measure adopted. Finally, we evaluate the method through a systematic experimental comparison with CV-based Bayes boundary estimation, which is known to be highly reliable in the Bayes error estimation. From the analysis, we rigorously show the theoretical validity of our adopted uncertainty measure. Moreover, from the experiment, we successfully demonstrate that our method can closely approximate the CV-based Bayes boundary estimate and its corresponding classifier parameter status with only a single-shot training over the data in hand.

Highlights

  • In the statistical approach to the development of pattern classifiers, the ultimate goal of classifier training is to find the optimal classifier parameter status that leads to the minimum classification error probability

  • We introduced a new method to evaluate a general form of classifier

  • The purpose was to define a new way of selecting the optimal classifier parameter status to overcome the fundamental limitations of the standard methods

Read more

Summary

Introduction

In the statistical approach to the development of pattern classifiers, the ultimate goal of classifier training is to find the optimal classifier parameter (class model parameters) status that leads to the minimum classification error probability ( called Bayes error). Many classifier training methods have been vigorously investigated to achieve this goal through accurate estimation. HO splits a given sample set into a training sample subset and a validation sample subset, and estimates the error probability (or classifier status) by using the pair of these two subsets. This raises the issue of how given samples should be split, which inevitably decreases either the training or the validation samples, and degrades the error probability estimation. CV, LOO-CV, and Bootstrap give an accurate and reliable estimate of the error probability conditioned by the number of resampling repetitions. CV solves the above degradation issue by producing multiple pairs of subsets

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.