Abstract

Repeat expansions are responsible for over 40 monogenic disorders, and undoubtedly more pathogenic repeat expansions remain to be discovered. Existing methods for detecting repeat expansions in short-read sequencing data require predefined repeat catalogs. Recent discoveries emphasize the need for methods that do not require pre-specified candidate repeats. To address this need, we introduce ExpansionHunter Denovo, an efficient catalog-free method for genome-wide repeat expansion detection. Analysis of real and simulated data shows that our method can identify large expansions of 41 out of 44 pathogenic repeats, including nine recently reported non-reference repeat expansions not discoverable via existing methods.

Highlights

  • High-throughput whole-genome sequencing (WGS) has experienced rapid reductions in per-genome costs over the past 10 years [1] driving population-level sequencing projects and precision medicine initiatives at an unprecedented scale [2,3,4,5,6,7]

  • We demonstrate that ExpansionHunter Denovo (EHdn) can be used to rediscover the repeat expansions (REs) associated with fragile X syndrome (FXS), Friedreich ataxia (FRDA), myotonic dystrophy type 1 (DM1), and Huntington disease (HD) using case-control analysis to compare a small number of affected individuals (N = 14–35) to control samples (N = 150)

  • Subsequent comparisons of short tandem repeats (STRs) profiles across multiple samples can reveal the locations of the pathogenic repeat expansions

Read more

Summary

Introduction

High-throughput whole-genome sequencing (WGS) has experienced rapid reductions in per-genome costs over the past 10 years [1] driving population-level sequencing projects and precision medicine initiatives at an unprecedented scale [2,3,4,5,6,7]. The primary limitations of these studies are the completeness of the reference genome and the ability to identify putative causal variations against the reference background. A wide variety of software tools can identify variations relative to the reference genome such as single nucleotide variants and short (1–50 bp) insertions and deletions [8,9,10,11,12,13], copy number variants [14, 15], and Dolzhenko et al Genome Biology (2020) 21:102 structural variants [15,16,17]. Because some variants include large amounts of inserted sequence relative to the reference, methods that can analyze reads that do not align to the reference are needed

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.