Abstract

BackgroundIdentifying sets of related genes (gene sets) that are empirically associated with a treatment or phenotype often yields valuable biological insights. Several methods effectively identify gene sets in which individual genes have simple monotonic relationships with categorical, quantitative, or censored event-time variables. Some distance-based methods, such as distance correlations, may detect complex non-monotone associations of a gene-set with a quantitative variable that elude other methods. However, the distance correlations have yet to be generalized to associate gene-sets with categorical and censored event-time endpoints. Also, there is a need to determine which genes empirically drive the significance of an association of a gene set with an endpoint.ResultsWe develop gene-set distance analysis (GSDA) by generalizing distance correlations to evaluate the association of a gene set with categorical and censored event-time variables. We also develop a backward elimination procedure to identify a subset of genes that empirically drive significant associations. In simulation studies, GSDA more effectively identified complex non-monotone gene-set associations than did six other published methods. In the analysis of a pediatric acute myeloid leukemia (AML) data set, GSDA was the only method to discover that event-free survival (EFS) was associated with the 56-gene AML pathway gene-set, narrow that result down to 5 genes, and confirm the association of those 5 genes with EFS in a separate validation cohort. These results indicate that GSDA effectively identifies and characterizes complex non-monotonic gene-set associations that are missed by other methods.ConclusionGSDA is a powerful and flexible method to detect gene-set association with categorical, quantitative, or censored event-time variables, especially to detect complex non-monotonic gene-set associations. Available at https://CRAN.R-project.org/package=GSDA.

Highlights

  • Identifying sets of related genes that are empirically associated with a treatment or phenotype often yields valuable biological insights

  • significance and function of expression (SAFE) ranks individual genes according to that statistic and computes a Wilcoxon rank-sum statistic to compare the ranks of gene-set genes to those of other genes

  • Simulation studies We performed a series of simulation studies to evaluate the performance of the proposed gene-set distance analysis (GSDA) method, gene-set enrichment analysis (GSEA), gene-set association (GSA), SAFE, global test (GT), total of test statistics (TOTS), and projection onto orthogonal statistical tests (POST) in simple settings involving a categorical, numeric, and survival outcome (SC, SN, and SS), complex settings involving categorical, numeric, and survival outcome (CC, CN, and CS)

Read more

Summary

Introduction

Identifying sets of related genes (gene sets) that are empirically associated with a treatment or phenotype often yields valuable biological insights. Several methods effectively identify gene sets in which individual genes have simple monotonic relationships with categorical, quantitative, or censored event-time variables. Biomedical researchers frequently seek to determine which gene pathways or ontologies are affected by a treatment or involved in biological processes that influence a particular phenotype or clinical outcome. Several analysis methods that combine gene annotations with statistical analysis results to evaluate the association of sets of genes with specific biological annotations with a treatment or outcome have been used to make many scientific discoveries. SAFE has been generalized to evaluate associations of gene sets with other phenotypes or endpoints, including quantitative variables and censored event-time variables, such as survival times in oncology studies

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.