Abstract

Confusion matrices are numerical structures that deal with the distribution of errors between different classes or categories in a classification process. From a quality perspective, it is of interest to know if the confusion between the true class A and the class labelled as B is not the same as the confusion between the true class B and the class labelled as A. Otherwise, a problem with the classifier, or of identifiability between classes, may exist. In this paper two statistical methods are considered to deal with this issue. Both of them focus on the study of the off-diagonal cells in confusion matrices. First, McNemar-type tests to test the marginal homogeneity are considered, which must be followed from a one versus all study for every pair of categories. Second, a Bayesian proposal based on the Dirichlet distribution is introduced. This allows us to assess the probabilities of misclassification in a confusion matrix. Three applications, including a set of omic data, have been carried out by using the software R.

Highlights

  • Departamento de Estadística e Investigación Operativa, Facultad de Matemáticas, Universidad de Sevilla, Abstract: Confusion matrices are numerical structures that deal with the distribution of errors between different classes or categories in a classification process

  • Confusion matrices are the standard way of summarizing the performance of a classification method. This is an issue of crucial interest in a variety of applied scientific disciplines, such as Geostatistics, mining data, mining text, Economy, Biomedicine or Bioinformatics, to cite only a few

  • By classification bias, we mean this kind of systematic error, which happens between categories in a specific direction

Read more

Summary

Introduction

Departamento de Estadística e Investigación Operativa, Facultad de Matemáticas, Universidad de Sevilla, Abstract: Confusion matrices are numerical structures that deal with the distribution of errors between different classes or categories in a classification process. If a classifier is fair or unbiased, the errors of classification between two given categories A and B must happen randomly, that is, it is expected that they occur approximately with the same relative frequency in every direction. Quite often, this is not the case, and a kind of systematic error occurs in a direction, that is, the observed value in a cell is considerably greater (or smaller) than its symmetric in the confusion matrix. In case of being detected, the method of selection of k must be revised; On the other hand, the classification bias may be caused by the existence of a unidirectional confusion between two or more categories, that is, the classes under

Objectives
Methods
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.