Abstract

A major goal of the application of Machine Learning techniques to high-throughput genomics data (e.g. DNA microarrays or RNA-Seq), is the identification of “gene signatures”. These signatures can be used to discriminate among healthy or disease states (e.g. normal vs cancerous tissue) or among different biological mechanisms, at the gene expression level. Thus, the literature is plenty of studies, where numerous feature selection techniques are applied, in an effort to reduce the noise and dimensionality of such datasets. However, little attention is given to the stability of these signatures, in cases where the original dataset is perturbed by adding, removing or simply resampling the original observations. In this article, we are assessing the stability of a set of well characterized public cancer microarray datasets, using five popular feature selection algorithms in the field of high-throughput genomics data analysis.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.