Reply to Marchini and Howie

D.Y Lin,Y Hu

doi:10.1016/j.ajhg.2008.09.008

D.Y Lin, Y Hu

Open Access

https://doi.org/10.1016/j.ajhg.2008.09.008

Copy DOI

Abstract

To the Editor: As noted by Marchini and Howie (MH), an advantage of our maximum likelihood (ML) approach is that the genotypes of untyped SNPs are inferred from proper posterior distributions. The two-stage approach, which ignores the phenotype information in the imputation of genotypes, can yield biased estimates of genetic effects near disease loci and consequently reduce power, especially when the genetic effects are strong. It is difficult to fully account for the uncertainties of the imputed genotypes in the two-stage approach, especially if environmental covariates are involved.From a frequentist point of view, it is impossible to do better than the ML approach, which has the highest statistical efficiency among all valid methods (that use the same data and make the same assumptions). The two-stage approach might produce more accurate results than the ML approach in certain situations because it allows the use of sophisticated population-genetics models in the first stage. The ML approach is more robust, in that it estimates the joint distribution between the untyped SNP and the flanking markers nonparametrically. Although we use a small number of flanking markers, we search over all subsets of flanking markers around the untyped SNP and select the subset that provides the best prediction of genotypes at the untyped SNP. By searching over all possible subsets of four SNPs among the 20 SNPs closest to each untyped HapMap SNP, we can typically obtain Rs2 of 1 for more than 50% of untyped SNPs and Rs2 of > 0.9 for 80% of untyped SNPs. It is unclear how much improvement sophisticated population-genetics models can bring.MH are absolutely right that our simulation studies did not evaluate the role of sophisticated population-genetics models. Indeed, we stated this fact in the Discussion of our article. Our simulation studies were designed to compare the ML and two-stage approaches when the same set of flanking markers is used. The results showed the efficiency gain of the ML approach due to the use of the phenotype information when inferring unobserved genotypes and the use of retrospective likelihood for reflecting case-control sampling. When applying the ML method to real data, we always search over a large region around each untyped SNP to find a set of flanking markers that provides the best prediction of genotypes for the untyped SNP.We are intrigued by the comparisons between SNPMStat and IMPUTE/SNPTEST reported by MH. However, it is difficult to draw any firm conclusion from a small number of selective data sets. The results for the Rheumatoid Arthritis study shown in Figure 1 of MH were based on a subset of the HapMap SNPs that was originally posted on our website for the users to test our software. As mentioned by MH, we recently updated the reference panel to include all of the HapMap SNPs. With this more realistic reference panel, the results of SNPMStat and IMPUTE/SNPTEST are very similar; see our Figure 1Figure 1. For this example, SNPMStat was ten times faster than IMPUTE/SNPTEST. It is unclear how representative the two examples shown in Figures 2 and 3 of MH are or how robust the results of IMPUTE/SNPTEST are to the choices of parameters used in the population-genetics model. It does not seem possible for an imputation method with correct type I error to always produce p values at untyped SNPs that are much smaller than those at typed SNPs. The comparisons on the p value scale might exaggerate the differences between competing methods, because a small difference in the test statistic at the extreme tail(s) of the distribution translates into a substantial difference in the p value. As noted by MH, it would be preferable to compare the ML and two-stage approaches through extensive simulation studies with realistic SNP landscapes and disease effect sizes.Figure 1Results of Running SNPMStat and IMPUTE/SNPTEST on the Simulated Rheumatoid Arthritis Study Data when the Reference Panel Contains all of the HapMap SNPsThe −log10 p values under the additive model for the genotyped and untyped SNPs are shown in black and red dots, respectively.View Large Image | View Hi-Res Image | Download PowerPoint Slide

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: The American Journal of Human Genetics	Publication Date: Oct 1, 2008
Citations: 1	License type: publisher-specific-oa

R Discovery Prime

R Discovery Prime

Reply to Marchini and Howie

Abstract

Talk to us

Similar Papers

More From: The American Journal of Human Genetics

Lead the way for us

Similar Papers

Statistical Methods to Study Timing of Vulnerability with Sparsely Sampled Data on Environmental Toxicants
Brisa Ney Sánchez ... Howard Hu
Environmental Health Perspectives | VOL. 119
Brisa Ney Sánchez, et. al.Brisa Ney Sánchez ... Howard Hu
08 Dec 2010
Environmental Health Perspectives | VOL. 119

Investigating Approaches to Estimating Covariate Effects in Growth Mixture Modeling: A Simulation Study.
Ming Li ... Jeffrey R Harring
Educational and psychological measurement | VOL. 77
Ming Li, et. al.Ming Li ... Jeffrey R Harring
15 Jun 2016
Educational and psychological measurement | VOL. 77

Maximum likelihood blind separation of convolutively mixed discrete sources
Gu Fanglin ... Zhang Hang
China Communications | VOL. 10
Gu Fanglin, et. al.Gu Fanglin ... Zhang Hang
01 Jun 2013
China Communications | VOL. 10

5. Finite Normal Mixture SEM Analysis by Fitting Multiple Conventional SEM Models
Ke-Hai Yuan ... Peter M Bentler
Sociological Methodology | VOL. 40
Ke-Hai Yuan, et. al.Ke-Hai Yuan ... Peter M Bentler
04 May 2010
Sociological Methodology | VOL. 40

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Reply to Marchini and Howie

Abstract

Talk to us

Similar Papers

More From: The American Journal of Human Genetics