Abstract

In this paper, we perform diagnostic pattern recognition on a gene-expression profile data set by using one-class classification. Unlike conventional multiclass classifiers, the one-class (OC) classifier is built on one class only. For optimal performance, it accepts samples coming from the class used for training and rejects all samples from other classes. We evaluate six OC classifiers: the Gaussian model, Parzen windows, support vector data description (with two types of kernels: inner product and Gaussian), nearest neighbor data description, K-means, and PCA on three gene-expression profile data sets, those being an SRBCT data set, a Colon data set, and a Leukemia data set. Providing there is a good splitting of training and test samples and feature selection, most OC classifiers can produce high quality results. Parzen windows and support vector data description are "over-strict" in most cases, while nearest neighbor data description is "over-loose". Other classifiers are intermediate between these two extremes. The main difficulty for the OC classifier is it is difficult to obtain an optimum decision threshold if there are a limited number of training samples.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call