Abstract

BackgroundAs new technologies allow investigators to collect multiple forms of molecular data (genomic, epigenomic, transcriptomic, etc) and multiple endpoints on a clinical trial cohort, it will become necessary to effectively integrate all these data in a way that reliably identifies biologically important genes.MethodsWe introduce CC-PROMISE as an integrated data analysis method that combines components of canonical correlation (CC) and projection onto the most interesting evidence (PROMISE). For each gene, CC-PROMISE first uses CC to compute scores that represent the association of two forms of molecular data with each other. Next, these scores are substituted into PROMISE to evaluate the statistical evidence that the molecular data show a biologically meaningful relationship with the endpoints.ResultsCC-PROMISE shows outstanding performance in simulation studies and an example application involving pediatric leukemia. In simulation studies, CC-PROMISE controls the type I error (misleading significance) rate very near the nominal level across 100 distinct null settings in which no molecular-endpoint association exists. Also, CC-PROMISE has better statistical power than three other methods that control type I error in 396 of 400 (99 %) alternative settings for which a molecular-endpoint association is present; the power advantage of CC-PROMISE exceeds 30 % in 127 of the 400 (32 %) alternative settings. These advantages of CC-PROMISE are also observed in an example application.ConclusionCC-PROMISE very effectively identifies genes for which some form of molecular data shows a biologically meaningful association with multiple related endpoints.AvailabilityThe R package CCPROMISE is currently available from www.stjuderesearch.org/site/depts/biostats/software.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-1217-0) contains supplementary material, which is available to authorized users.

Highlights

  • As new technologies allow investigators to collect multiple forms of molecular data and multiple endpoints on a clinical trial cohort, it will become necessary to effectively integrate all these data in a way that reliably identifies biologically important genes

  • Availability: The R package CCPROMISE is currently available from www.stjuderesearch.org/site/depts/biostats/ software

  • Simulation studies Data generation We performed simulation studies to evaluate the statistical properties of canonical correlation (CC)-projection onto the most interesting evidence (PROMISE), PROMISE, and list overlap approaches as methods for integrated analysis of two forms of molecular data and multiple endpoints

Read more

Summary

Introduction

As new technologies allow investigators to collect multiple forms of molecular data (genomic, epigenomic, transcriptomic, etc) and multiple endpoints on a clinical trial cohort, it will become necessary to effectively integrate all these data in a way that reliably identifies biologically important genes. The advance of microarray and sequencing technologies have empowered the scientific community to economically and rapidly collect multipe forms of molecular ‘omic’ data for large cohorts of patients. These molecular data have provided intriguing insights into the development of. Genome-wide association studies (GWAS) have explored the association of one form of molecular data with one clinical endpoint of interest. GWAS studies and data analyses have yielded many intriguing biological insights as enumerated by the GWAS catalog (https://www.ebi.ac.uk/gwas/)

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call