Abstract
To the Editor: As noted by Marchini and Howie (MH), an advantage of our maximum likelihood (ML) approach is that the genotypes of untyped SNPs are inferred from proper posterior distributions. The two-stage approach, which ignores the phenotype information in the imputation of genotypes, can yield biased estimates of genetic effects near disease loci and consequently reduce power, especially when the genetic effects are strong. It is difficult to fully account for the uncertainties of the imputed genotypes in the two-stage approach, especially if environmental covariates are involved.From a frequentist point of view, it is impossible to do better than the ML approach, which has the highest statistical efficiency among all valid methods (that use the same data and make the same assumptions). The two-stage approach might produce more accurate results than the ML approach in certain situations because it allows the use of sophisticated population-genetics models in the first stage. The ML approach is more robust, in that it estimates the joint distribution between the untyped SNP and the flanking markers nonparametrically. Although we use a small number of flanking markers, we search over all subsets of flanking markers around the untyped SNP and select the subset that provides the best prediction of genotypes at the untyped SNP. By searching over all possible subsets of four SNPs among the 20 SNPs closest to each untyped HapMap SNP, we can typically obtain Rs2 of 1 for more than 50% of untyped SNPs and Rs2 of > 0.9 for 80% of untyped SNPs. It is unclear how much improvement sophisticated population-genetics models can bring.MH are absolutely right that our simulation studies did not evaluate the role of sophisticated population-genetics models. Indeed, we stated this fact in the Discussion of our article. Our simulation studies were designed to compare the ML and two-stage approaches when the same set of flanking markers is used. The results showed the efficiency gain of the ML approach due to the use of the phenotype information when inferring unobserved genotypes and the use of retrospective likelihood for reflecting case-control sampling. When applying the ML method to real data, we always search over a large region around each untyped SNP to find a set of flanking markers that provides the best prediction of genotypes for the untyped SNP.We are intrigued by the comparisons between SNPMStat and IMPUTE/SNPTEST reported by MH. However, it is difficult to draw any firm conclusion from a small number of selective data sets. The results for the Rheumatoid Arthritis study shown in Figure 1 of MH were based on a subset of the HapMap SNPs that was originally posted on our website for the users to test our software. As mentioned by MH, we recently updated the reference panel to include all of the HapMap SNPs. With this more realistic reference panel, the results of SNPMStat and IMPUTE/SNPTEST are very similar; see our Figure 1Figure 1. For this example, SNPMStat was ten times faster than IMPUTE/SNPTEST. It is unclear how representative the two examples shown in Figures 2 and 3 of MH are or how robust the results of IMPUTE/SNPTEST are to the choices of parameters used in the population-genetics model. It does not seem possible for an imputation method with correct type I error to always produce p values at untyped SNPs that are much smaller than those at typed SNPs. The comparisons on the p value scale might exaggerate the differences between competing methods, because a small difference in the test statistic at the extreme tail(s) of the distribution translates into a substantial difference in the p value. As noted by MH, it would be preferable to compare the ML and two-stage approaches through extensive simulation studies with realistic SNP landscapes and disease effect sizes.Figure 1Results of Running SNPMStat and IMPUTE/SNPTEST on the Simulated Rheumatoid Arthritis Study Data when the Reference Panel Contains all of the HapMap SNPsThe −log10 p values under the additive model for the genotyped and untyped SNPs are shown in black and red dots, respectively.View Large Image | View Hi-Res Image | Download PowerPoint Slide
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.