Abstract

We propose a semiparametric approach for the analysis of case–control genome-wide association study. Parametric components are used to model both the conditional distribution of the case status given the covariates and the distribution of genotype counts, whereas the distribution of the covariates are modelled nonparametrically. This yields a direct and joint modelling of the case status, covariates and genotype counts, and gives a better understanding of the disease mechanism and results in more reliable conclusions. Side information, such as the disease prevalence, can be conveniently incorporated into the model by an empirical likelihood approach and leads to more efficient estimates and a powerful test in the detection of disease-associated SNPs. Profiling is used to eliminate a nuisance nonparametric component, and the resulting profile empirical likelihood estimates are shown to be consistent and asymptotically normal. For the hypothesis test on disease association, we apply the approximate Bayes factor (ABF) which is computationally simple and most desirable in genome-wide association studies where hundreds of thousands to a million genetic markers are tested. We treat the approximate Bayes factor as a hybrid Bayes factor which replaces the full data by the maximum likelihood estimates of the parameters of interest in the full model and derive it under a general setting. The deviation from Hardy–Weinberg Equilibrium (HWE) is also taken into account and the ABF for HWE using cases is shown to provide evidence of association between a disease and a genetic marker. Simulation studies and an application are further provided to illustrate the utility of the proposed methodology.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call