Feature selection with limited datasets

Matthew A Kupinski,Maryellen L Giger

doi:10.1118/1.598821

Matthew A Kupinski, Maryellen L Giger

https://doi.org/10.1118/1.598821

Copy DOI

Export

Save

Cite

Journal: Medical Physics	Publication Date: Oct 1, 1999
Citations: 49

Affiliation: University of Chicago

Abstract
Full-Text
Similar Papers

Abstract

Listen

Computer-aided diagnosis has the potential of increasing diagnostic accuracy by providing a second reading to radiologists. In many computerized schemes, numerous features can be extracted to describe suspect image regions. A subset of these features is then employed in a data classifier to determine whether the suspect region is abnormal or normal. Different subsets of features will, in general, result in different classification performances. A feature selection method is often used to determine an "optimal" subset of features to use with a particular classifier. A classifier performance measure (such as the area under the receiver operating characteristic curve) must be incorporated into this feature selection process. With limited datasets, however, there is a distribution in the classifier performance measure for a given classifier and subset of features. In this paper, we investigate the variation in the selected subset of "optimal" features as compared with the true optimal subset of features caused by this distribution of classifier performance. We consider examples in which the probability that the optimal subset of features is selected can be analytically computed. We show the dependence of this probability on the dataset sample size, the total number of features from which to select, the number of features selected, and the performance of the true optimal subset. Once a subset of features has been selected, the parameters of the data classifier must be determined. We show that, with limited datasets and/or a large number of features from which to choose, bias is introduced if the classifier parameters are determined using the same data that were employed to select the "optimal" subset of features.

Full Text