Multiple testing in genome-wide association studies via hidden Markov models

Zhi Wei,Kai Wang,Hakon Hakonarson,Wenguang Sun

doi:10.1093/bioinformatics/btp476

Abstract

Genome-wide association studies (GWAS) interrogate common genetic variation across the entire human genome in an unbiased manner and hold promise in identifying genetic variants with moderate or weak effect sizes. However, conventional testing procedures, which are mostly P-value based, ignore the dependency and therefore suffer from loss of efficiency. The goal of this article is to exploit the dependency information among adjacent single nucleotide polymorphisms (SNPs) to improve the screening efficiency in GWAS. We propose to model the linear block dependency in the SNP data using hidden Markov models (HMMs). A compound decision-theoretic framework for testing HMM-dependent hypotheses is developed. We propose a powerful data-driven procedure [pooled local index of significance (PLIS)] that controls the false discovery rate (FDR) at the nominal level. PLIS is shown to be optimal in the sense that it has the smallest false negative rate (FNR) among all valid FDR procedures. By re-ranking significance for all SNPs with dependency considered, PLIS gains higher power than conventional P-value based methods. Simulation results demonstrate that PLIS dominates conventional FDR procedures in detecting disease-associated SNPs. Our method is applied to analysis of the SNP data from a GWAS of type 1 diabetes. Compared with the Benjamini-Hochberg (BH) procedure, PLIS yields more accurate results and has better reproducibility of findings. The genomic rankings based on our procedure are substantially different from the rankings based on the P-values. By integrating information from adjacent locations, the PLIS rankings benefit from the increased signal-to-noise ratio, hence our procedure often has higher statistical power and better reproducibility. It provides a promising direction in large-scale GWAS. An R package PLIS has been developed to implement the PLIS procedure. Source codes are available upon request and will be available on CRAN (http://cran.r-project.org/). zhiwei@njit.edu Supplementary data are available at Bioinformatics online.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multiple testing in genome-wide association studies via hidden Markov models

Abstract

Talk to us

Similar Papers

More From: Bioinformatics

Lead the way for us

Journal: Bioinformatics	Publication Date: May 4, 2009
Citations: 87

Similar Papers

Large-scale multiple testing in genome-wide association studies via region-specific hidden Markov models.
Jian Xiao ... Wensheng Zhu
BMC Bioinformatics | VOL. 14
Jian Xiao, et. al.Jian Xiao ... Wensheng Zhu
25 Sep 2013
BMC Bioinformatics | VOL. 14

Hidden Markov Models for Controlling False Discovery Rate in Genome-Wide Association Analysis
Zhi Wei
-
Zhi WeiZhi Wei
18 Nov 2011
18 Nov 2011

Replicability analysis in genome-wide association studies via Cartesian hidden Markov models
Pengfei Wang ... Wensheng Zhu
BMC Bioinformatics | VOL. 20
Pengfei Wang, et. al.Pengfei Wang ... Wensheng Zhu
18 Mar 2019
BMC Bioinformatics | VOL. 20

Diamonds in the Rough: Rare Variants Scratch the Surface
Erin Podolak
BioTechniques | VOL. 49
Erin PodolakErin Podolak
01 Oct 2010
BioTechniques | VOL. 49

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multiple testing in genome-wide association studies via hidden Markov models

Abstract

Talk to us

Similar Papers

More From: Bioinformatics