Abstract
BackgroundDiscovering genetic associations between genetic markers and gene expression levels can provide insight into gene regulation and, potentially, mechanisms of disease. Such analyses typically involve a linkage or association analysis in which expression data are used as phenotypes. This approach leads to a large number of multiple comparisons and may therefore lack power. We assess the potential of applying canonical correlation analysis to partitioned genomewide data as a method for discovering regulatory variants.Methodology/Principal FindingsSimulations suggest that canonical correlation analysis has higher power than standard pairwise univariate regression to detect single nucleotide polymorphisms when the expression trait has low heritability. The increase in power is even greater under the recessive model. We demonstrate this approach using the Childhood Asthma Management Program data.Conclusions/SignificanceOur approach reduces multiple comparisons and may provide insight into the complex relationships between genotype and gene expression.
Highlights
The usefulness of examining associations between genetic markers and gene expression is due to the immediate and direct relationship between the gene expression phenotype and DNA sequence variation
Canonical correlation analysis (CCA) cannot be applied to all single nucleotide polymorphisms (SNPs) and expression probes in a genomewide association study since the number of variables is greater than the number of subjects
Under the alternative hypothesis of association, the power to detect a significant correlation with Bartlett’s test is compared with the power to detect the simulated association by regressing the expression quantitative trait locus of interest on the number of copies of the risk allele using Bonferroni correction to adjust for 60 pairwise tests (Table 2)
Summary
The usefulness of examining associations between genetic markers and gene expression is due to the immediate and direct relationship between the gene expression phenotype and DNA sequence variation. CCA finds a linear combination of the genotypes and a linear combination of the expression levels such that the correlation between the two is maximized As it is, CCA cannot be applied to all SNPs and expression probes in a genomewide association study since the number of variables is greater than the number of subjects. Two modifications of CCA have recently been proposed for use with genetic marker and gene expression data: penalized CCA [3] and sparse CCA [4] These methods are computationally intensive and are sometimes sensitive to starting parameters. Discovering genetic associations between genetic markers and gene expression levels can provide insight into gene regulation and, potentially, mechanisms of disease Such analyses typically involve a linkage or association analysis in which expression data are used as phenotypes. We assess the potential of applying canonical correlation analysis to partitioned genomewide data as a method for discovering regulatory variants
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have