Abstract
Germline mutation detection from human DNA sequence data is challenging due to the rarity of such events relative to the intrinsic error rates of sequencing technologies and the uneven coverage across the genome. We developed PhaseByTransmission (PBT) to identify de novo single nucleotide variants and short insertions and deletions (indels) from sequence data collected in parent-offspring trios. We compute the joint probability of the data given the genotype likelihoods in the individual family members, the known familial relationships and a prior probability for the mutation rate. Candidate de novo mutations (DNMs) are reported along with their posterior probability, providing a systematic way to prioritize them for validation. Our tool is integrated in the Genome Analysis Toolkit and can be used together with the ReadBackedPhasing module to infer the parental origin of DNMs based on phase-informative reads. Using simulated data, we show that PBT outperforms existing tools, especially in low coverage data and on the X chromosome. We further show that PBT displays high validation rates on empirical parent-offspring sequencing data for whole-exome data from 104 trios and X-chromosome data from 249 parent-offspring families. Finally, we demonstrate an association between father’s age at conception and the number of DNMs in female offspring’s X chromosome, consistent with previous literature reports.
Highlights
De novo mutation (DNM) between generations is a key mechanism in evolution
PBT considers biallelic single nucleotide variants (SNVs) and short insertions and deletions within the autosomes and the X chromosome, and generates a list of all candidate DNMs ranked by their posterior probability
A key advantage is the integration of PBT within the widely used Genome Analysis Toolkit (GATK)[9] and its ability to leverage phase information from the GATK ReadBackedPhasing module to identify the parental origin of DNMs
Summary
De novo mutation (DNM) between generations is a key mechanism in evolution. In humans, the mutation rate is estimated between 1 × 10 − 8 and 3 × 10 − 8 per base per generation from direct observations[1,2,3,4] and from species comparisons,[5] mutation rates have been shown to vary locally,[2,6] across families[2,3,4] and to depend on paternal age.[3]. Generation sequencing (NGS) technologies applied to whole genomes in pedigrees enable systematic discovery and analysis of DNMs. Because the error rates from NGS data are currently much greater than the underlying DNM rate, detecting DNMs from NGS data requires accurate, quantitative calibration of the evidence supporting a novel allele in the offspring and the evidence against. Because the error rates from NGS data are currently much greater than the underlying DNM rate, detecting DNMs from NGS data requires accurate, quantitative calibration of the evidence supporting a novel allele in the offspring and the evidence against Mendelian transmission of this allele from (one of) the parents. A key advantage is the integration of PBT within the widely used Genome Analysis Toolkit (GATK)[9] and its ability to leverage phase information from the GATK ReadBackedPhasing module to identify the parental origin of DNMs
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.