The effect of missing data on linkage disequilibrium mapping and haplotype association analysis in the GAW14 simulated datasets

Pamela A Mccaskie,Kim W Carter,Lyle J Palmer,Simon R Mccaskie

doi:10.1186/1471-2156-6-s1-s151

Abstract

We used our newly developed linkage disequilibrium (LD) plotting software, JLIN, to plot linkage disequilibrium between pairs of single-nucleotide polymorphisms (SNPs) for three chromosomes of the Genetic Analysis Workshop 14 Aipotu simulated population to assess the effect of missing data on LD calculations. Our haplotype analysis program, SIMHAP, was used to assess the effect of missing data on haplotype-phenotype association. Genotype data was removed at random, at levels of 1%, 5%, and 10%, and the LD calculations and haplotype association results for these levels of missingness were compared to those for the complete dataset. It was concluded that ignoring individuals with missing data substantially affects the number of regions of LD detected which, in turn, could affect tagging SNPs chosen to generate haplotypes.

Highlights

As we begin to discover more about how haplotypes are defined and inherited, the emphasis in genetic association studies has moved away from the analysis of single nucleotide polymorphisms (SNPs) to incorporate multilocus haplotype analysis
Using the linkage disequilibrium (LD) plotting program JLIN, we discovered that ignoring missing genotype data affects how accurately we map regions of LD, and the level of strong LD observed
We have shown, using plots of LD that this practice affects the LD coefficient D' and can result in an increase in number of pair-wise comparisons exhibiting strong LD

Summary

Introduction

As we begin to discover more about how haplotypes are defined and inherited, the emphasis in genetic association studies has moved away from the analysis of single nucleotide polymorphisms (SNPs) to incorporate multilocus haplotype analysis. Haplotypes should offer advantages in terms of statistical power to detect a true association with a given sample size compared with analyses based on single SNPs or combinations of SNPs [1-3], because they contain more genetic information than the genotypes alone. It is increasingly clear from other fields of statistical investigation that ignoring missing data or restricting the analysis to subjects with complete data-even when data is missing completely at random-can lead to biased or inefficient analyses [4-8]. In the case of missing genetic data, the effects on LD analysis and subsequent haplotype formation can be substantial depending on the amount of data missing

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Genetics	Publication Date: Dec 1, 2005
Citations: 24	License type: cc-by

R Discovery Prime

R Discovery Prime

The effect of missing data on linkage disequilibrium mapping and haplotype association analysis in the GAW14 simulated datasets

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genetics

Lead the way for us

Similar Papers

ATRIUM: Testing Untyped SNPs in Case-Control Association Studies with Related Individuals
Zuoheng Wang ... Mary Sara Mcpeek
The American Journal of Human Genetics | VOL. 85
Zuoheng Wang, et. al.Zuoheng Wang ... Mary Sara Mcpeek
01 Nov 2009
The American Journal of Human Genetics | VOL. 85

Extent and Distribution of Linkage Disequilibrium in Three Genomic Regions
Gonçalo R Abecasis ... William O.C Cookson
The American Journal of Human Genetics | VOL. 68
Gonçalo R Abecasis, et. al.Gonçalo R Abecasis ... William O.C Cookson
01 Jan 2001
The American Journal of Human Genetics | VOL. 68

Allelic association with SNPs: metrics, populations, and the linkage disequilibrium map.
A Collins ... P-Y Kwok
Human Mutation | VOL. 17
A Collins, et. al.A Collins ... P-Y Kwok
01 Jan 2001
Human Mutation | VOL. 17

A Flexible Bayesian Framework for Modeling Haplotype Association with Disease, Allowing for Dominance Effects of the Underlying Causative Variants
Andrew P Morris
The American Journal of Human Genetics | VOL. 79
Andrew P MorrisAndrew P Morris
01 Oct 2006
The American Journal of Human Genetics | VOL. 79

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The effect of missing data on linkage disequilibrium mapping and haplotype association analysis in the GAW14 simulated datasets

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genetics