Abstract

We used our newly developed linkage disequilibrium (LD) plotting software, JLIN, to plot linkage disequilibrium between pairs of single-nucleotide polymorphisms (SNPs) for three chromosomes of the Genetic Analysis Workshop 14 Aipotu simulated population to assess the effect of missing data on LD calculations. Our haplotype analysis program, SIMHAP, was used to assess the effect of missing data on haplotype-phenotype association. Genotype data was removed at random, at levels of 1%, 5%, and 10%, and the LD calculations and haplotype association results for these levels of missingness were compared to those for the complete dataset. It was concluded that ignoring individuals with missing data substantially affects the number of regions of LD detected which, in turn, could affect tagging SNPs chosen to generate haplotypes.

Highlights

  • As we begin to discover more about how haplotypes are defined and inherited, the emphasis in genetic association studies has moved away from the analysis of single nucleotide polymorphisms (SNPs) to incorporate multilocus haplotype analysis

  • Using the linkage disequilibrium (LD) plotting program JLIN, we discovered that ignoring missing genotype data affects how accurately we map regions of LD, and the level of strong LD observed

  • We have shown, using plots of LD that this practice affects the LD coefficient D' and can result in an increase in number of pair-wise comparisons exhibiting strong LD

Read more

Summary

Introduction

As we begin to discover more about how haplotypes are defined and inherited, the emphasis in genetic association studies has moved away from the analysis of single nucleotide polymorphisms (SNPs) to incorporate multilocus haplotype analysis. Haplotypes should offer advantages in terms of statistical power to detect a true association with a given sample size compared with analyses based on single SNPs or combinations of SNPs [1-3], because they contain more genetic information than the genotypes alone. It is increasingly clear from other fields of statistical investigation that ignoring missing data or restricting the analysis to subjects with complete data-even when data is missing completely at random-can lead to biased or inefficient analyses [4-8]. In the case of missing genetic data, the effects on LD analysis and subsequent haplotype formation can be substantial depending on the amount of data missing

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.