Abstract

A problem for machine learning research occurs when many possible features exist but the training data examples are very few. For example, microarray data typically have a much larger number of features, the genes, as compared to the number of training data examples, the patients. One approach is to first determine the best features for prediction and then to group features based on a measure of their relatedness. The concordance correlation coefficient has been used to place somewhat correlated features into disjoint groups of similar features. Multiple base classifiers are created by randomly picking one feature from each of the feature groups and then the collection of base classifiers is used in an ensemble classifier. Each classifier in the ensemble provides a vote. The majority vote is used to produce the final class prediction. This paper investigates grouping features using fuzzy set similarity measures as well as the concordance correlation coefficient as a relatedness measure. The performance of these different measures is compared in terms of accuracy, sensitivity, specificity, and F-measure using the ensemble classifiers created with the different relatedness measures. Four microarray gene expression data sets are used in the experiments to determine the usefulness of fuzzy set similarity measures and how they compare with the concordance correlation coefficient. Using the concordance correlation coefficient to guide clustering is not superior to fuzzy set similarity measures. Depending on the particular data set and performance measure being used, different fuzzy set similarity measures perform better than or just as well as the concordance correlation coefficient.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.