Abstract

Despite the significant advances made over the last few years in mapping inversions with the advent of paired-end sequencing approaches, our understanding of the prevalence and spectrum of inversions in the human genome has lagged behind other types of structural variants, mainly due to the lack of a cost-efficient method applicable to large-scale samples. We propose a novel method based on principal components analysis (PCA) to characterize inversion polymorphisms using high-density SNP genotype data. Our method applies to non-recurrent inversions for which recombination between the inverted and non-inverted segments in inversion heterozygotes is suppressed due to the loss of unbalanced gametes. Inside such an inversion region, an effect similar to population substructure is thus created: two distinct “populations” of inversion homozygotes of different orientations and their 1∶1 admixture, namely the inversion heterozygotes. This kind of substructure can be readily detected by performing PCA locally in the inversion regions. Using simulations, we demonstrated that the proposed method can be used to detect and genotype inversion polymorphisms using unphased genotype data. We applied our method to the phase III HapMap data and inferred the inversion genotypes of known inversion polymorphisms at 8p23.1 and 17q21.31. These inversion genotypes were validated by comparing with literature results and by checking Mendelian consistency using the family data whenever available. Based on the PCA-approach, we also performed a preliminary genome-wide scan for inversions using the HapMap data, which resulted in 2040 candidate inversions, 169 of which overlapped with previously reported inversions. Our method can be readily applied to the abundant SNP data, and is expected to play an important role in developing human genome maps of inversions and exploring associations between inversions and susceptibility of diseases.

Highlights

  • Common structural variations in the human genome such as deletions, duplications, and inversions are known to be associated with disease susceptibility [1,2,3] and to be subject to selection [4]

  • If only markers inside the inversion region are used for principal components analysis (PCA), as shown in Figure 1, variations caused by the inversion polymorphism will dominate and the three clusters will be distributed along the first eigenvector

  • We propose that the threestripe pattern in the inversion region is a special case of admixture: because of suppression of recombination in heterozygotes, the inverted and non-inverted segments act as if they were in different populations and are represented by the two side clusters, and the inversion heterozygotes are a perfect 1:1 admixture of the two homozygotes and are represented by the middle cluster, which is in the middle of the two clusters

Read more

Summary

Introduction

Common structural variations in the human genome such as deletions, duplications, and inversions are known to be associated with disease susceptibility [1,2,3] and to be subject to selection [4]. Among different types of structural variations, characterization of inversions in the human genome remains a difficult, or at least laborious, task because of the lack of a high-throughput technique for detecting them. 953 inversion regions are listed in the Database of Genomic Variants (DGV) [7]. The sequencing-based method has been successfully used to screen for inversions, it has some limitations [8] and does not efficiently apply to large number of samples that are needed to characterize inversions in a population and detect their association with diseases

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call