Abstract

BackgroundThe recent discovery of widespread copy number variation in humans has forced a shift away from the assumption of two copies per locus per cell throughout the autosomal genome. In particular, a SNP site can no longer always be accurately assigned one of three genotypes in an individual. In the presence of copy number variability, the individual may theoretically harbor any number of copies of each of the two SNP alleles.ResultsTo address this issue, we have developed a method to infer a "generalized genotype" from raw SNP microarray data. Here we apply our approach to data from 48 individuals and uncover thousands of aberrant SNPs, most in regions that were previously unreported as copy number variants. We show that our allele-specific copy numbers follow Mendelian inheritance patterns that would be obscured in the absence of SNP allele information. The interplay between duplication and point mutation in our data shed light on the relative frequencies of these events in human history, showing that at least some of the duplication events were recurrent.ConclusionThis new multi-allelic view of SNPs has a complicated role in disease association studies, and further work will be necessary in order to accurately assess its importance. Software to perform generalized genotyping from SNP array data is freely available online [1].

Highlights

  • The recent discovery of widespread copy number variation in humans has forced a shift away from the assumption of two copies per locus per cell throughout the autosomal genome

  • It is likely that our results drastically underrepresent the prevalence of aberrant SNPs in the population, as the array's manufacturer deliberately excluded SNPs that violated Hardy-Weinberg equilibrium, Mendelian inheritance, and other quality control requirements [10] that would naturally not be met in the presence of copy number variant (CNV)

  • Our own requirement that at least three consecutive SNPs show the CNV is very conservative, and will by definition omit more focal events

Read more

Summary

Introduction

The recent discovery of widespread copy number variation in humans has forced a shift away from the assumption of two copies per locus per cell throughout the autosomal genome. As the importance of these duplications and deletions in the study of a variety of diseases [3,4,5,6] is being realized, cataloging them and assessing their frequencies has become an important goal Toward this end, two recent studies [7,8] have exploited erroneous SNP genotype calls, inferring germline deletions at clusters of calls that violate Mendelian inheritance or other conditions. As most recent estimates put the proportion of the genome harboring CNVs at at least 12% [9], allowing for more general genotypes is crucial for the accuracy of SNP typing in disease studies. Such direct and accurate typing would, reveal CNVs automatically

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call