Abstract

Building a population-specific catalogue of single nucleotide variants (SNVs), indels and structural variants (SVs) with frequencies, termed a national pan-genome, is critical for further advancing clinical and public health genetics in large cohorts. Here we report a Danish pan-genome obtained from sequencing 10 trios to high depth (50 × ). We report 536k novel SNVs and 283k novel short indels from mapping approaches and develop a population-wide de novo assembly approach to identify 132k novel indels larger than 10 nucleotides with low false discovery rates. We identify a higher proportion of indels and SVs than previous efforts showing the merits of high coverage and de novo assembly approaches. In addition, we use trio information to identify de novo mutations and use a probabilistic method to provide direct estimates of 1.27e−8 and 1.5e−9 per nucleotide per generation for SNVs and indels, respectively.

Highlights

  • Building a population-specific catalogue of single nucleotide variants (SNVs), indels and structural variants (SVs) with frequencies, termed a national pan-genome, is critical for further advancing clinical and public health genetics in large cohorts

  • Due to the high number of novel indels and that indels in repeat regions are hard to accurately call, we classified the indels based on their primary sequence context into homopolymer runs (HRs) and tandem repeats (TRs)[4]

  • In total 40.9% of the indels were associated with a canonical HR or TR and an additional 19.3% were associated with non-canonical HR or TR sites

Read more

Summary

Introduction

Building a population-specific catalogue of single nucleotide variants (SNVs), indels and structural variants (SVs) with frequencies, termed a national pan-genome, is critical for further advancing clinical and public health genetics in large cohorts. We use trio information to identify de novo mutations and use a probabilistic method to provide direct estimates of 1.27e À 8 and 1.5e À 9 per nucleotide per generation for SNVs and indels, respectively. As de novo assembly previously has shown promise for detection of SVs in single human genomes[19,20], we expand this approach to be used at population level and identify 53.2k and 78.5k novel deletions and insertions (410 bp), respectively, with a low FDR. Mediumsized (20–300 bp) insertions display a high rate of novelty (49.0k of 53.1k are novel; 92.2%) and a low overlap with alignmentbased methods (o10%) This is likely due to ascertainment bias when using traditional alignment-based methods for detection of insertions and underlines the importance of de novo assemblybased techniques for discovering variation

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.