Abstract

In the past decade, several studies have estimated the human per-generation germline mutation rate using large pedigrees. More recently, estimates for various nonhuman species have been published. However, methodological differences among studies in detecting germline mutations and estimating mutation rates make direct comparisons difficult. Here, we describe the many different steps involved in estimating pedigree-based mutation rates, including sampling, sequencing, mapping, variant calling, filtering, and appropriately accounting for false-positive and false-negative rates. For each step, we review the different methods and parameter choices that have been used in the recent literature. Additionally, we present the results from a 'Mutationathon,' a competition organized among five research labs to compare germline mutation rate estimates for a single pedigree of rhesus macaques. We report almost a twofold variation in the final estimated rate among groups using different post-alignment processing, calling, and filtering criteria, and provide details into the sources of variation across studies. Though the difference among estimates is not statistically significant, this discrepancy emphasizes the need for standardized methods in mutation rate estimations and the difficulty in comparing rates from different studies. Finally, this work aims to provide guidelines for computational and statistical benchmarks for future studies interested in identifying germline mutations from pedigrees.

Highlights

  • Germline mutations are the source of most genetic diseases and provide the raw material for evolution

  • The age of the parents at conception is not available, and instead, the mean age of reproduction is used for the estimation of the per-year mutation rate. This approximation can lead to biased results if the age of the parents at 177 conception was much older or much younger compared to the mean age in the population

  • We explored the effect of the individual filter on the number of candidate de novo mutations (DNMs), the number of false-positive calls (FP), the callable genome (CG), the false-negative rate (FNR), and the final estimated mutation rate per-site per generation (μ)

Read more

Summary

Introduction

Germline mutations are the source of most genetic diseases and provide the raw material for evolution. It is crucial to accurately estimate the frequency at which mutations occur in order to better understand the course of evolutionary events. The development of high throughput next-generation sequencing offers the opportunity to directly estimate the germline mutation rate over a single generation, based on a whole-genome comparison of pedigree samples (mother, father, and offspring), without requiring assumptions about generation times or fossil calibrations (Tiley et al, 2020). Pedigree sequencing provides multiple pieces of information in addition to an overall mutation rate. Using pedigrees means that researchers often have precise information about the age of the parents at the time of reproduction, and comparing several trios (i.e. three related individuals: mother, father, and offspring) at different parental ages can tell us about the effect of parental age on the total number of transmitted mutations, their location, and their spectrum.

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call