Abstract

Genome-wide association studies (GWAS) are popular for identifying genetic variants which are associated with disease risk. Many approaches have been proposed to test multiple single nucleotide polymorphisms (SNPs) in a region simultaneously which considering disadvantages of methods in single locus association analysis. Kernel machine based SNP set analysis is more powerful than single locus analysis, which borrows information from SNPs correlated with causal or tag SNPs. Four types of kernel machine functions and principal component based approach (PCA) were also compared. However, given the loss of power caused by low minor allele frequencies (MAF), we conducted an extension work on PCA and used a new method called weighted PCA (wPCA). Comparative analysis was performed for weighted principal component analysis (wPCA), logistic kernel machine based test (LKM) and principal component analysis (PCA) based on SNP set in the case of different minor allele frequencies (MAF) and linkage disequilibrium (LD) structures. We also applied the three methods to analyze two SNP sets extracted from a real GWAS dataset of non-small cell lung cancer in Han Chinese population. Simulation results show that when the MAF of the causal SNP is low, weighted principal component and weighted IBS are more powerful than PCA and other kernel machine functions at different LD structures and different numbers of causal SNPs. Application of the three methods to a real GWAS dataset indicates that wPCA and wIBS have better performance than the linear kernel, IBS kernel and PCA.

Highlights

  • At present, genome-wide association study (GWAS) has been a popular approach for studying the genetic susceptibility of complex diseases

  • The present work is an extension of Zhao et al in which we aim to identify whether weighted single nucleotide polymorphisms (SNPs) set analysis may increase the statistical power in the case of low minor allele frequencies (MAF) and different linkage disequilibrium (LD) structures

  • For weighted PCA (wPCA) and principal component based approach (PCA), the type I error rates are independent of the number of principal components (PCs) and different weights included in the model

Read more

Summary

Introduction

Genome-wide association study (GWAS) has been a popular approach for studying the genetic susceptibility of complex diseases. Chips used in GWAS can simultaneously scan hundreds of thousands or even more SNPs in comparatively wide chromosomal regions by comparing the frequencies of genetic variants in cases and controls and estimating whether the locus is associated with the disease [1,2]. It is common to run single locus association tests in the whole GWAS for identifying causal single nucleotide polymorphisms (SNPs) with strong effects on disease. Such a SNP-wise analysis may result in computational burden and the well-known issue of multiple testing [4].

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call