Abstract

Identifying regions of the genome that are depleted of mutations can reveal potentially deleterious variants. Short tandem repeats (STRs), also known as microsatellites, are among the largest contributors of de novo mutations in humans. However, per-locus studies of STR mutations have been limited to highly ascertained panels of several dozen loci. Here, we harnessed bioinformatics tools and a novel analytical framework to estimate mutation parameters for each STR in the human genome by correlating STR genotypes with local sequence heterozygosity. We applied our method to obtain robust estimates of the impact of local sequence features on mutation parameters and used this to create a framework for measuring constraint at STRs by comparing observed vs. expected mutation rates. Constraint scores identified known pathogenic variants with early onset effects. Our metric will provide a valuable tool for prioritizing pathogenic STRs in medical genetics studies.

Highlights

  • Mutations that have negative fitness consequences tend to be eliminated from the population

  • Motivated by the poor fit of the widely used generalized stepwise mutation model (GSM) to our data (Supplementary Note), we developed a novel lengthbiased version of the GSM that closely recapitulates observed population-wide trends (Supplementary Note, Supplementary Figures 1,2), including a saturation of the STR molecular clock over time

  • We developed a method called MUTEA that employs a similar model to precisely estimate individual mutation rates for Y chromosome STRs (Y-STRs) from population-scale sequencing of unrelated individuals

Read more

Summary

Introduction

Mutations that have negative fitness consequences tend to be eliminated from the population. Samocha et al.[1] determine the expected number of de novo variants per gene based on a neutral model obtained by counting mutations for each possible trinucleotide context in intergenic SNPs. In a different approach, fitCons[3] aggregates non-coding regions with similar functional annotations and compares observed variation in those regions to an expectation obtained from presumably neutral flanking regions. FitCons[3] aggregates non-coding regions with similar functional annotations and compares observed variation in those regions to an expectation obtained from presumably neutral flanking regions These methods have mainly focused on single nucleotide polymorphisms (SNPs) and to a lesser extent on small indels. Repeat variants are commonly excluded from medical genetics analyses

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call