Abstract

BackgroundMicroarray technology is often used to identify the genes that are differentially expressed between two biological conditions. On the other hand, since microarray datasets contain a small number of samples and a large number of genes, it is usually desirable to identify small gene subsets with distinct pattern between sample classes. Such gene subsets are highly discriminative in phenotype classification because of their tightly coupling features. Unfortunately, such identified classifiers usually tend to have poor generalization properties on the test samples due to overfitting problem.ResultsWe propose a novel approach combining both supervised learning with unsupervised learning techniques to generate increasingly discriminative gene clusters in an iterative manner. Our experiments on both simulated and real datasets show that our method can produce a series of robust gene clusters with good classification performance compared with existing approaches.ConclusionThis backward approach for refining a series of highly discriminative gene clusters for classification purpose proves to be very consistent and stable when applied to various types of training samples.

Highlights

  • Microarray technology is often used to identify the genes that are differentially expressed between two biological conditions

  • We have tested our method on other real datasets and compared the performance of our algorithm with those reported in the previous literature

  • We evaluate our algorithm using another cluster set Δ′2, the final set of active clusters generated by our algorithm with S' as the input gene subset and with all the 72 samples as the training samples, where S' is the set of the 357 genes (10% of all the 3,571 genes) that are highly correlated with the acute myeloid leukemia (AML)/acute lymphoblastic leukemia (ALL) classes in terms of the correlation metric proposed in [1]

Read more

Summary

Introduction

Microarray technology is often used to identify the genes that are differentially expressed between two biological conditions. Since microarray datasets contain a small number of samples and a large number of genes, it is usually desirable to identify small gene subsets with distinct pattern between sample classes. Such gene subsets are highly discriminative in phenotype classification because of their tightly coupling features. The selected highly discriminative genes after filtering out those non-representative genes which may dilute the pattern in classification computation can be further studied for the investigation on the biological mechanisms that are responsible for class distinction.

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.