Abstract

Regularized linear classifiers have been successfully applied in undersampled, i.e. small sample size/high dimensionality biomedical classification problems. Additionally, a design of data complexity measures was proposed in order to assess the competence of a classifier in a particular context. Our work was motivated by the analysis of ill-posed regression problems by Elden and the interpretation of linear discriminant analysis as a mean square error classifier. Using Singular Value Decomposition analysis, we define a discriminatory power spectrum and show that it provides useful means of data complexity assessment for undersampled classification problems. In five real-life biomedical data sets of increasing difficulty we demonstrate how the data complexity of a classification problem can be related to the performance of regularized linear classifiers. We show that the concentration of the discriminatory power manifested in the discriminatory power spectrum is a deciding factor for the success of the regularized linear classifiers in undersampled classification problems. As a practical outcome of our work, the proposed data complexity assessment may facilitate the choice of a classifier for a given undersampled problem.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call