Abstract
Identification of causal genomic mutations that underlie disease phenotypes remains a key problem in the field of medical informatics. With the advent of new sequencing technologies and decreasing cost of human genotyping, it is now possible to study genotype-phenotype interactions, such as genome-wide association studies (GWAS), at the population level. However, due to large genomic variance and linkage disequilibrium, genetic diversity of a complete human population cannot be captured by a limited number of clusters. Furthermore, application of current haplotype inferencing (phasing) methods to rare genomic variance, such as disease-related alleles, is not reliable. Hence, a satisfactory method for deleterious mutation identification remains largely elusive. Here we present a non-parametric Bayesian model that jointly infers haplotypes and identifies deleterious mutations, taking into consideration genomic variance in the human population. The model is based on the Dirichlet process, which can capture genomic variance by modeling it with non-bounded numbers of clusters.
Highlights
Identification of causal genomic mutations that underlie disease phenotypes remains a key problem in the field of medical informatics
Due to large genomic variance and linkage disequilibrium, genetic diversity of a complete human population cannot be captured by a limited number of clusters
The model is based on the Dirichlet process, which can capture genomic variance by modeling it with nonbounded numbers of clusters
Summary
Identification of causal genomic mutations that underlie disease phenotypes remains a key problem in the field of medical informatics. Dirichlet process model for joint haplotype inference and GWAS From Beyond the Genome 2012 Boston, MA, USA.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have