Abstract

In order to have a better understanding of unexplained heritability for complex diseases in conventional Genome-Wide Association Studies (GWAS), aggregated association analyses based on predefined functional regions, such as genes and pathways, become popular recently as they enable evaluating joint effect of multiple Single-Nucleotide Polymorphisms (SNPs), which helps increase the detection power, especially when investigating genetic variants with weak individual effects. In this paper, we focus on aggregated analysis methods based on the idea of Principal Component Analysis (PCA). The past approaches using PCA mostly make some inherent genotype data and/or risk effect model assumptions, which may hinder the accurate detection of potential disease SNPs that influence disease phenotypes. In this paper, we derive a general Supervised Categorical Principal Component Analysis (SCPCA), which explicitly models categorical SNP data without imposing any risk effect model assumption. We have evaluated the efficacy of SCPCA with the comparison to a traditional Supervised PCA (SPCA) and a previously developed Supervised Logistic Principal Component Analysis (SLPCA) based on both the simulated genotype data by HAPGEN2 and the genotype data of Crohn's Disease (CD) from Wellcome Trust Case Control Consortium (WTCCC). Our preliminary results have demonstrated the superiority of SCPCA over both SPCA and SLPCA due to its modeling explicitly designed for categorical SNP data as well as its flexibility on the risk effect model assumption.Electronic supplementary materialThe online version of this article (doi:10.1186/1471-2164-15-S1-S10) contains supplementary material, which is available to authorized users.

Highlights

  • Genome-wide association studies (GWAS) aim to detect the association of genetic variants across the whole genome with traits of interest such as disease phenotypes

  • Conclusions and future work We have derived Categorical PCA (CPCA) for aggregated association analysis of categorical Single-Nucleotide Polymorphisms (SNPs) data, which is further extended to supervised CPCA (SCPCA) in a supervised framework

  • Our SCPCA captures more relevant information from SNP data based on a better data modelling and aggregates genotypic information from multiple SNPs into a combined signal that is the most associated with the trait by a heuristic selection procedure

Read more

Summary

Introduction

Genome-wide association studies (GWAS) aim to detect the association of genetic variants across the whole genome with traits of interest such as disease phenotypes. They have been successful in identification of susceptibility loci through association analysis of individual single nucleotide polymorphism (SNP) markers with common diseases [1]. The associated common variants at the identified susceptibility loci have been found with only modest individual effect [4]. It has always been a challenge for GWAS to detect those SNPs with weak individual effects but may affect disease outcome by strong epistatic effect. As GWAS focus on single-marker association tests, the obtained results may not provide clear insights into which genes

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call