Abstract

Haplotype association studies based on family genotype data can provide more biological information than single marker association studies. Difficulties arise, however, in the inference of haplotype phase determination and in haplotype transmission/non-transmission status. Incorporation of the uncertainty associated with haplotype inference into regression models requires special care. This task can get even more complicated when the genetic region contains a large number of haplotypes. To avoid the curse of dimensionality, we employ a clustering algorithm based on the evolutionary relationship among haplotypes and retain for regression analysis only the ancestral core haplotypes identified by it. To integrate the three sources of variation, phase ambiguity, transmission status and ancestral uncertainty, we propose an uncertainty-coding matrix which combines these three types of variability simultaneously. Next we evaluate haplotype risk with the use of such a matrix in a Bayesian conditional logistic regression model. Simulation studies and one application, a schizophrenia multiplex family study, are presented and the results are compared with those from other family based analysis tools such as FBAT. Our proposed method (Bayesian regression using uncertainty-coding matrix, BRUCM) is shown to perform better and the implementation in R is freely available.

Highlights

  • Many genetic studies of complex diseases are interested in detecting associations between genetic markers and disease status

  • In family studies with collected genotype data, the inference of haplotype risk requires the determination of haplotype phase and corresponding transmission and non-transmission status

  • This matrix was used in a Bayesian conditional logistic regression model to examine the existence of haplotype risk

Read more

Summary

Introduction

Many genetic studies of complex diseases are interested in detecting associations between genetic markers and disease status. To evaluate the strength of such association, a regression approach may be adopted and applied to family haplotype data. Advantages of this regression framework include the ability to estimate and test the association, and its flexibility in accommodating individual information, and gene-gene and gene-environment interactions. The second group of remedies, in contrast, included the set of all possible haplotype configurations compatible with the observed genotype, constructed the corresponding likelihood for each haplotype explanation, and put weights on these likelihoods or log-likelihoods to establish a full likelihood function for case-control studies [6,7]. For the family data here, we preserve the uncertainty in haplotype configurations with a rationale similar to that of the second group of remedies

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call