Abstract

Genome-wide association studies (GWAS) have helped to reveal genetic mechanisms of complex diseases. Although commonly used genotyping technology enables us to determine up to a million single-nucleotide polymorphisms (SNPs), causative variants are typically not genotyped directly. A favored approach to increase the power of genome-wide association studies is to impute the untyped SNPs using more complete genotype data of a reference population.Random forests (RF) provides an internal method for replacing missing genotypes. A forest of classification trees is used to determine similarities of probands regarding their genotypes. These proximities are then used to impute genotypes of untyped SNPs.We evaluated this approach using genotype data of the Framingham Heart Study provided as Problem 2 for Genetic Analysis Workshop 16 and the Caucasian HapMap samples as reference population. Our results indicate that RFs are faster but less accurate than alternative approaches for imputing untyped SNPs.

Highlights

  • Genome-wide association studies (GWAS) have expanded our knowledge about genomic variants that influence susceptibility to complex diseases such as myocardial infarction [1,2]

  • Our results indicate that Random forests (RF) are faster but less accurate than alternative approaches for imputing untyped singlenucleotide polymorphisms (SNPs)

  • BMC Proceedings 2009, 3(Suppl 7):S65 http://www.biomedcentral.com/1753-6561/3/S7/S65 high linkage disequilibrium (LD). This approach has proved successful in many cases, it is still likely that a great number of causal variations are yet undetected and that the power of genome-wide association studies (GWAS) could be increased by performing statistical tests with disease influencing SNPs directly [3]

Read more

Summary

Introduction

Genome-wide association studies (GWAS) have expanded our knowledge about genomic variants that influence susceptibility to complex diseases such as myocardial infarction [1,2]. With about 15 million known SNPs in the current Build 129 of dbSNP http://www.ncbi.nlm.nih. Gov/SNP and almost four million of these available in release 23a from the HapMap project http://www. Today’s GWAS are usually not able to genotype causal variants but will detect association with a nearby SNP in (page number not for citation purposes). BMC Proceedings 2009, 3(Suppl 7):S65 http://www.biomedcentral.com/1753-6561/3/S7/S65 high linkage disequilibrium (LD). This approach has proved successful in many cases, it is still likely that a great number of causal variations are yet undetected and that the power of GWAS could be increased by performing statistical tests with disease influencing SNPs directly [3]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call