Abstract
BackgroundGene set analysis (GSA) methods test the association of sets of genes with a phenotype in gene expression microarray studies. Many GSA methods have been proposed, especially methods for use with a binary phenotype. Equally, if not more importantly however, is the ability to test the enrichment of a gene signature or pathway against the continuous phenotypes which are routinely and commonly observed in, for example, clinicopathological measurements. It is not always easy or meaningful to dichotomize continuous phenotypes into two classes, and attempting to do this may lead to the inaccurate classification of samples, which would affect the downstream enrichment analysis. In the present study, we have build on recent efforts to incorporate correlation structure within gene sets and pathways into the GSA test statistic. To address the issue of continuous phenotypes directly without the need for artificial discrete classification and thus increase the power of the test while ensuring computational efficiency and rigor, new GSA methods that can incorporate a covariance matrix estimator for a continuous phenotype may present an effective approach.ResultsWe have designed a new method by extending the GSA approach called Linear Combination Test (LCT) from a binary to a continuous phenotype. Simulation studies and a real microarray dataset were used to compare the proposed LCT for a continuous phenotype, a modification of LCT (referred to as LCT2), and two publicly available GSA methods for continuous phenotypes.ConclusionsWe found that the LCT methods performed better than the other two GSA methods; however, this finding should be understood in the context of our specific simulation studies and the real microarray dataset that were used to compare the methods. Free R-codes to perform LCT for binary and continuous phenotypes are available at http://www.ualberta.ca/~yyasui/homepage.html. The R-code to perform LCT for a continuous phenotype is available as Additional file 1.
Highlights
Gene set analysis (GSA) methods test the association of sets of genes with a phenotype in gene expression microarray studies
The R-code to perform Linear Combination Test (LCT) for a continuous phenotype is available as Additional file 1
When we compared the performances of various self-contained GSA methods for binary phenotypes, we found that LCT was more computationally efficient than MANOVA-GSA and approximated its superior power very well
Summary
Gene set analysis (GSA) methods test the association of sets of genes with a phenotype in gene expression microarray studies. The main challenges in using these methods are the relatively small sample sizes and large gene sets; a situation which is not uncommon in GSA, especially in small microarray studies. To overcome these difficulties, shrinkage methods [7] have been used to estimate the gene expressions covariance matrix. When we compared the performances of various self-contained GSA methods for binary phenotypes, we found that LCT was more computationally efficient than MANOVA-GSA and approximated its superior power very well We propose both an extension of LCT to continuous phenotype (hereafter referred to as LCT) and a modified version of LCT (hereafter referred to as LCT2). We used simulations to compare the performances of the GSA methods with small sample sizes and large gene sets
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.