Abstract
BackgroundFeature selection and gene set analysis are of increasing interest in the field of bioinformatics. While these two approaches have been developed for different purposes, we describe how some gene set analysis methods can be utilized to conduct feature selection.MethodsWe adopted a gene set analysis method, the significance analysis of microarray gene set reduction (SAMGSR) algorithm, to carry out feature selection for longitudinal gene expression data.ResultsUsing a real-world application and simulated data, it is demonstrated that the proposed SAMGSR extension outperforms other relevant methods. In this study, we illustrate that a gene’s expression profiles over time can be regarded as a gene set and then a suitable gene set analysis method can be utilized directly to select relevant genes associated with the phenotype of interest over time.ConclusionsWe believe this work will motivate more research to bridge feature selection and gene set analysis, with the development of novel algorithms capable of carrying out feature selection for longitudinal gene expression data.
Highlights
Feature selection and gene set analysis are of increasing interest in the field of bioinformatics
In terms of computing time, a single run of the simple significance analysis of microarray gene set reduction (SAMGSR) algorithms takes 4.03 min on a Mac Pro equipped with a 2.2 GHZ dual-core processor and 16GB RAM
Using a real-world application, we showed that the longitudinal SAMGSR method is superior to other relevant algorithms
Summary
Feature selection and gene set analysis are of increasing interest in the field of bioinformatics. While pathway analysis aims to identify relevant pathways/gene sets associated with a phenotype of interest, feature selection mainly focuses on the identification of relevant individual genes. These two tools seem to be parallel to each other. The statistical approach typically employed to analyze longitudinal omics data is to stratify the data into separate time points and tackle them separately This naïve strategy is inefficient given the highly dependent structure between the measures of same subject over time is erroneously ignored, leading to a huge loss of statistical power and a failure to detect incremental changes in gene expression patterns over time [6,7,8]. The separate applications of cross-sectional feature selection methods (where the gene expression values were measured at a single time point) are ineffective for longitudinal microarray data [8]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.