Abstract
Microsatellites-a type of short tandem repeat (STR)-have been used for decades as putatively neutral markers to study the genetic structure of diverse human populations. However, recent studies have demonstrated that some microsatellites contribute to gene expression, cis heritability, and phenotype. As a corollary, some microsatellites may contribute to differential gene expression and RNA/protein structure stability in distinct human populations. To test this hypothesis, we investigate genotype frequencies, functional relevance, and adaptive potential of microsatellites in five super-populations (ethnicities) drawn from the 1000 Genomes Project. We discover 3,984 ethnically-biased microsatellite loci (EBML); for each EBML at least one ethnicity has genotype frequencies statistically different from the remaining four. South Asian, East Asian, European, and American EBML show significant overlap; on the contrary, the set of African EBML is mostly unique. We cross-reference the 3,984 EBML with 2,060 previously identified expression STRs (eSTRs); repeats known to affect gene expression (64 total) are over-represented. The most significant pathway enrichments are those associated with the matrisome: a broad collection of genes encoding the extracellular matrix and its associated proteins. At least 14 of the EBML have established links to human disease. Analysis of the 3,984 EBML with respect to known selective sweep regions in the genome shows that allelic variation in some of them is likely associated with adaptive evolution.
Highlights
Two thirds of the human genome consists of repetitive DNA [1]
The overall pattern of variation is consistent with previous studies that focus on smaller panels of known polymorphic microsatellites [30,31,32]: AFR, EAS, and EUR in three outside clusters with AMR and SAS in two overlapping central clusters (Fig 1A)
The 434 ethnically-biased microsatellite loci (EBML) include 21 in the coding sequence of 18 genes (S9 Table); one is a previously identified expression short tandem repeats (STRs) (eSTRs) (CDS of gene USP36). These findings suggest a degree of mutual overlap between EBML, eSTRs, and selective sweeps in the human genome
Summary
Two thirds of the human genome consists of repetitive DNA [1]. These repeats vary in size, complexity, and abundance in the genome: microsatellites are perhaps the simplest. Each microsatellite consist of a short motif (1–6 base pairs) repeated in tandem to form an array [2]; over 600,000 unique microsatellites exist in the human genome [3, 4].
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have