Using an Uncertainty-Coding Matrix in Bayesian Regression Models for Haplotype-Specific Risk Detection in Family Association Studies

Yung-Hsiang Huang,Mei-Hsien Lee,Wei J Chen,Chuhsing Kate Hsiao

doi:10.1371/journal.pone.0021890

Yung-Hsiang Huang, Mei-Hsien Lee + Show 2 more

Open Access

PDF Available

https://doi.org/10.1371/journal.pone.0021890

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Haplotype association studies based on family genotype data can provide more biological information than single marker association studies. Difficulties arise, however, in the inference of haplotype phase determination and in haplotype transmission/non-transmission status. Incorporation of the uncertainty associated with haplotype inference into regression models requires special care. This task can get even more complicated when the genetic region contains a large number of haplotypes. To avoid the curse of dimensionality, we employ a clustering algorithm based on the evolutionary relationship among haplotypes and retain for regression analysis only the ancestral core haplotypes identified by it. To integrate the three sources of variation, phase ambiguity, transmission status and ancestral uncertainty, we propose an uncertainty-coding matrix which combines these three types of variability simultaneously. Next we evaluate haplotype risk with the use of such a matrix in a Bayesian conditional logistic regression model. Simulation studies and one application, a schizophrenia multiplex family study, are presented and the results are compared with those from other family based analysis tools such as FBAT. Our proposed method (Bayesian regression using uncertainty-coding matrix, BRUCM) is shown to perform better and the implementation in R is freely available.

Highlights

Many genetic studies of complex diseases are interested in detecting associations between genetic markers and disease status
In family studies with collected genotype data, the inference of haplotype risk requires the determination of haplotype phase and corresponding transmission and non-transmission status
This matrix was used in a Bayesian conditional logistic regression model to examine the existence of haplotype risk

Summary

Introduction

Many genetic studies of complex diseases are interested in detecting associations between genetic markers and disease status. To evaluate the strength of such association, a regression approach may be adopted and applied to family haplotype data. Advantages of this regression framework include the ability to estimate and test the association, and its flexibility in accommodating individual information, and gene-gene and gene-environment interactions. The second group of remedies, in contrast, included the set of all possible haplotype configurations compatible with the observed genotype, constructed the corresponding likelihood for each haplotype explanation, and put weights on these likelihoods or log-likelihoods to establish a full likelihood function for case-control studies [6,7]. For the family data here, we preserve the uncertainty in haplotype configurations with a rationale similar to that of the second group of remedies

Methods

Results

Conclusion