A stable iterative method for refining discriminative gene clusters

Min Xu,Mengxia Zhu,Louxin Zhang

doi:10.1186/1471-2164-9-s2-s18

Min Xu, Mengxia Zhu + Show 1 more

Open Access

PDF Available

https://doi.org/10.1186/1471-2164-9-s2-s18

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

BackgroundMicroarray technology is often used to identify the genes that are differentially expressed between two biological conditions. On the other hand, since microarray datasets contain a small number of samples and a large number of genes, it is usually desirable to identify small gene subsets with distinct pattern between sample classes. Such gene subsets are highly discriminative in phenotype classification because of their tightly coupling features. Unfortunately, such identified classifiers usually tend to have poor generalization properties on the test samples due to overfitting problem.ResultsWe propose a novel approach combining both supervised learning with unsupervised learning techniques to generate increasingly discriminative gene clusters in an iterative manner. Our experiments on both simulated and real datasets show that our method can produce a series of robust gene clusters with good classification performance compared with existing approaches.ConclusionThis backward approach for refining a series of highly discriminative gene clusters for classification purpose proves to be very consistent and stable when applied to various types of training samples.

Highlights

Microarray technology is often used to identify the genes that are differentially expressed between two biological conditions
We have tested our method on other real datasets and compared the performance of our algorithm with those reported in the previous literature
We evaluate our algorithm using another cluster set Δ′2, the final set of active clusters generated by our algorithm with S' as the input gene subset and with all the 72 samples as the training samples, where S' is the set of the 357 genes (10% of all the 3,571 genes) that are highly correlated with the acute myeloid leukemia (AML)/acute lymphoblastic leukemia (ALL) classes in terms of the correlation metric proposed in [1]

Summary

Introduction

Microarray technology is often used to identify the genes that are differentially expressed between two biological conditions. Since microarray datasets contain a small number of samples and a large number of genes, it is usually desirable to identify small gene subsets with distinct pattern between sample classes. Such gene subsets are highly discriminative in phenotype classification because of their tightly coupling features. The selected highly discriminative genes after filtering out those non-representative genes which may dilute the pattern in classification computation can be further studied for the investigation on the biological mechanisms that are responsible for class distinction.

Methods

Results

Conclusion