Abstract
A recently proposed optimal Bayesian classification paradigm addresses optimal error rate analysis for small-sample discrimination, including optimal classifiers, optimal error estimators, and error estimation analysis tools with respect to the probability of misclassification under binary classes. Here, we address multi-class problems and optimal expected risk with respect to a given risk function, which are common settings in bioinformatics. We present Bayesian risk estimators (BRE) under arbitrary classifiers, the mean-square error (MSE) of arbitrary risk estimators under arbitrary classifiers, and optimal Bayesian risk classifiers (OBRC). We provide analytic expressions for these tools under several discrete and Gaussian models and present a new methodology to approximate the BRE and MSE when analytic expressions are not available. Of particular note, we present analytic forms for the MSE under Gaussian models with homoscedastic covariances, which are new even in binary classification.
Highlights
Classification in biomedicine is often constrained by small samples so that understanding properties of the error rate is critical to ensure the scientific validity of a designed classifier
This can be done with linear discriminant analysis (LDA) and quadratic discriminant analysis (QDA) for multiple classes with arbitrary loss functions, which essentially assume that the underlying class-conditional densities are Gaussian with equal or unequal covariances, respectively
The analytic form that we provide for the mean-square error (MSE) of arbitrary error estimators under homoscedastic models is completely new without an analog in prior work under binary classification and zero-one loss
Summary
Classification in biomedicine is often constrained by small samples so that understanding properties of the error rate is critical to ensure the scientific validity of a designed classifier. A few classical classification algorithms naturally permit multiple classes and arbitrary loss functions; for example, a plug-in rule takes the functional form for an optimal Bayes decision rule under a given modeling assumption and substitutes sample estimates of model parameters in place of the true parameters. This can be done with linear discriminant analysis (LDA) and quadratic discriminant analysis (QDA) for multiple classes with arbitrary loss functions, which essentially assume that the underlying class-conditional densities are Gaussian with equal or unequal covariances, respectively. We present a new computationally efficient method to approximate the conditional MSE based on the effective joint density
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: EURASIP Journal on Bioinformatics and Systems Biology
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.