Abstract

BackgroundIn genetic association study, especially in GWAS, gene- or region-based methods have been more popular to detect the association between multiple SNPs and diseases (or traits). Kernel principal component analysis combined with logistic regression test (KPCA-LRT) has been successfully used in classifying gene expression data. Nevertheless, the purpose of association study is to detect the correlation between genetic variations and disease rather than to classify the sample, and the genomic data is categorical rather than numerical. Recently, although the kernel-based logistic regression model in association study has been proposed by projecting the nonlinear original SNPs data into a linear feature space, it is still impacted by multicolinearity between the projections, which may lead to loss of power. We, therefore, proposed a KPCA-LRT model to avoid the multicolinearity.ResultsSimulation results showed that KPCA-LRT was always more powerful than principal component analysis combined with logistic regression test (PCA-LRT) at different sample sizes, different significant levels and different relative risks, especially at the genewide level (1E-5) and lower relative risks (RR = 1.2, 1.3). Application to the four gene regions of rheumatoid arthritis (RA) data from Genetic Analysis Workshop16 (GAW16) indicated that KPCA-LRT had better performance than single-locus test and PCA-LRT.ConclusionsKPCA-LRT is a valid and powerful gene- or region-based method for the analysis of GWAS data set, especially under lower relative risks and lower significant levels.

Highlights

  • In genetic association study, especially in genome-wide association studies (GWAS), gene- or region-based methods have been more popular to detect the association between multiple single nucleotide polymorphisms (SNPs) and diseases

  • Type I error Simulation results under H0 are shown in Table 1, which indicates that the type I error rates of both principal component analysis (PCA)-LRT and kernel PCA (KPCA)-LRT are very close to given nominal values (a = 0.01, a = 0.05) under different sample sizes

  • It is clear that KPCA-LRT is always much more powerful than PCALRT, especially at the significant level of 1E-5

Read more

Summary

Introduction

Especially in GWAS, gene- or region-based methods have been more popular to detect the association between multiple SNPs and diseases (or traits). To examine whether multiple SNPs in the candidate gene or region are associated with disease or trait, several multi-marker analysis methods have been developed, including haplotype-based methods [16,17], Hotelling’s T2 test [18,19], principal component analysis (PCA)based methods [20,21,22,23], and P-value combination methods [11,24,25]. PCA can capture linkage disequilibrium information within a candidate gene/region, but is less computationally demanding compared to haplotype-based analysis It avoids multicolinearity between SNPs, for the principal components (PCs) are orthogonal

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.