Abstract

Complex medical disorders, such as heart disease and diabetes, are thought to involve a number of genes which act in conjunction with lifestyle and environmental factors to increase disease susceptibility. Associations between complex traits and single nucleotide polymorphisms (SNPs) in candidate genomic regions can provide a useful tool for identifying genetic risk factors. However, analysis of trait associations with single SNPs ignores the potential for extra information from haplotypes, combinations of variants at multiple SNPs along a chromosome inherited from a parent. When haplotype-trait associations are of interest and haplotypes of individuals can be determined, generalized linear models (GLMs) may be used to investigate haplotype associations while adjusting for the effects of non-genetic cofactors or attributes. Unfortunately, haplotypes cannot always be determined cost-effectively when data is collected on unrelated subjects. Uncertain haplotypes may be inferred on the basis of data from single SNPs. However, subsequent analyses of risk factors must account for the resulting uncertainty in haplotype assignment in order to avoid potential errors in interpretation. To account for such uncertainty, we have developed hapassoc, software for R implementing a likelihood approach to inference of haplotype and non-genetic effects in GLMs of trait associations. We provide a description of the underlying statistical method and illustrate the use of hapassoc with examples that highlight the flexibility to specify dominant and recessive effects of genetic risk factors, a feature not shared by other software that restricts users to additive effects only. Additionally, hapassoc can accommodate missing SNP genotypes for limited numbers of subjects.

Highlights

  • Introduction and BackgroundThe identification of genetic factors influencing susceptibility to complex diseases such as cancer and diabetes is important for improving our understanding of disease pathways

  • Genetic factors are measured at multiple sites of known genomic location to determine if variants at these sites are associated with the disease trait

  • When there are no more than maxMissingGenos single nucleotide polymorphisms (SNPs) with missing genotype data, rows corresponding to haplotype configurations compatible with the missing genotype data for a subject are added to the end of the augmented data matrices haploDM and nonhaploDM returned by pre.hapassoc

Read more

Summary

Introduction and Background

The identification of genetic factors influencing susceptibility to complex diseases such as cancer and diabetes is important for improving our understanding of disease pathways. The three possible genotypes for a SNP marker are 0/0, 0/1 and 1/1, where “/” is used to separate the alleles inherited from each parent. When associations between haplotypes and disease outcomes or traits are of interest, a potential difficulty is that haplotype phase is not necessarily known because, typically, genotypes are only measured at individual markers. The analysis of haplotype-trait associations involves handling the missing phase data. GLMs provide the flexibility to incorporate non-genetic risk factors or potential confounding variables such as age or sex as covariates They allow the incorporation of interaction between genetic and environmental risk factors, a current research focus in the study of complex diseases. The statistical approach implemented in hapassoc is briefly described and the use of the program is illustrated with examples

Statistical Description
EM algorithm implemented in hapassoc
At the tth iteration:
Using hapassoc and its features
Logistic regression with input genotypes in “allelic” format
Linear regression with input genotypes in “genotypic” format
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call