Abstract
BackgroundMicroarray technology is often used to identify the genes that are differentially expressed between two biological conditions. On the other hand, since microarray datasets contain a small number of samples and a large number of genes, it is usually desirable to identify small gene subsets with distinct pattern between sample classes. Such gene subsets are highly discriminative in phenotype classification because of their tightly coupling features. Unfortunately, such identified classifiers usually tend to have poor generalization properties on the test samples due to overfitting problem.ResultsWe propose a novel approach combining both supervised learning with unsupervised learning techniques to generate increasingly discriminative gene clusters in an iterative manner. Our experiments on both simulated and real datasets show that our method can produce a series of robust gene clusters with good classification performance compared with existing approaches.ConclusionThis backward approach for refining a series of highly discriminative gene clusters for classification purpose proves to be very consistent and stable when applied to various types of training samples.
Highlights
Microarray technology is often used to identify the genes that are differentially expressed between two biological conditions
We have tested our method on other real datasets and compared the performance of our algorithm with those reported in the previous literature
We evaluate our algorithm using another cluster set Δ′2, the final set of active clusters generated by our algorithm with S' as the input gene subset and with all the 72 samples as the training samples, where S' is the set of the 357 genes (10% of all the 3,571 genes) that are highly correlated with the acute myeloid leukemia (AML)/acute lymphoblastic leukemia (ALL) classes in terms of the correlation metric proposed in [1]
Summary
Microarray technology is often used to identify the genes that are differentially expressed between two biological conditions. Since microarray datasets contain a small number of samples and a large number of genes, it is usually desirable to identify small gene subsets with distinct pattern between sample classes. Such gene subsets are highly discriminative in phenotype classification because of their tightly coupling features. The selected highly discriminative genes after filtering out those non-representative genes which may dilute the pattern in classification computation can be further studied for the investigation on the biological mechanisms that are responsible for class distinction.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have