Abstract
The average probability of error is used to demonstrate the performance of a Bayesian classification test (referred to as the Combined Bayes Test (CBT)) when the training data of each class are mislabeled. The CBT combines the information in discrete training and test data to infer symbol probabilities, where a uniform Dirichlet prior (i.e., a noninformative prior of complete ignorance) is assumed for all classes. Using the CBT, classification performance is shown to degrade when mislabeling exists in the training data, and this occurs with a severity that depends upon the mislabeling probabilities. With this, it is shown that as the mislabeling probabilities increase M ∗ , which is the best quantization fineness related to the Hughes phenomenon of pattern recognition, also increases. Notice, that even when the actual mislabeling probabilities are known by the CBT it is not possible to achieve the classification performance obtainable without mislabeling. However, the negative effect of mislabeling can be diminished, with more success for smaller mislabeling probabilities, if a data reduction method called the Bayesian Data Reduction Algorithm (BDRA) is applied to the training data.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have