The Y-chromosome is a powerful tool for population geneticists to study human evolutionary history. Haploid and largely non-recombining, it should contain a simple record of past mutational events. However, this apparent simplicity is compromised by Y-linked duplicons, which make up approximately 35% of this chromosome; 25% of these duplicons are large inverted repeats (palindromes). For microsatellites lying in these palindromes, two loci cannot be easily distinguished due to PCR co-amplification, and this order misspecification of alleles generates an additional variance component. Due to this ambiguity, population geneticists have traditionally used an arbitrary method to assign the alleles (shorter allele to locus 1, larger allele to locus 2). Here, we simulate these posterior estimate distributions under three different novel allele assignment priors and compare this with the original method. We use a sample of 33 human populations, typed for duplicated microsatellites lying within palindrome P8, to illustrate our approach. We show that both intra- and inter-population statistics can be dramatically affected by order misspecification. Surprisingly, matrices of pairwise F-statistics or distance estimates appear far less sensitive to order misspecification and remain relatively unchanged under the priors considered, suggesting that these microsatellites can be considered as useful markers for population genetic studies using an appropriate data treatment. Duplicated microsatellites represent an attractive source of information to investigate the extensive structural polymorphism observed among human Y chromosomes, as well as processes of intra-chromosomal gene conversion acting between duplicons.
Read full abstract