Abstract

Long-read sequencing (LRS) has the potential to comprehensively identify all medically relevant genome variation, including variation commonly missed by short-read sequencing (SRS) approaches. To determine this potential, we performed LRS around 15×–40× genome coverage using the Pacific Biosciences Sequel I System for five trios. The respective probands were diagnosed with intellectual disability (ID) whose etiology remained unresolved after SRS exomes and genomes. Systematic assessment of LRS coverage showed that ~35 Mb of the human reference genome was only accessible by LRS and not SRS. Genome-wide structural variant (SV) calling yielded on average 28,292 SV calls per individual, totaling 12.9 Mb of sequence. Trio-based analyses which allowed to study segregation, showed concordance for up to 95% of these SV calls across the genome, and 80% of the LRS SV calls were not identified by SRS. De novo mutation analysis did not identify any de novo SVs, confirming that these are rare events. Because of high sequence coverage, we were also able to call single nucleotide substitutions. On average, we identified 3 million substitutions per genome, with a Mendelian inheritance concordance of up to 97%. Of these, ~100,000 were located in the ~35 Mb of the genome that was only captured by LRS. Moreover, these variants affected the coding sequence of 64 genes, including 32 known Mendelian disease genes. Our data show the potential added value of LRS compared to SRS for identifying medically relevant genome variation.

Highlights

  • In the last decade short-read sequencing (SRS) approaches, such as whole exome sequencing (WES) and more recently whole genome sequencing (WGS), have revolutionized the field of medical genetics

  • Comparison of sequence coverage between short and long-read sequencing In Trio 5, we identified genomic regions that had no coverage with SRS but were well-covered by LRS using BEDtools genomecov [38]

  • We found that 33% of deletions (n = 4,526) and 73% of insertions (n = 15,963) in our cohort were novel compared to Audano et al (Table S8), highlighting the fact that there is still a large degree of previously hidden structural variation to be identified in the human genome

Read more

Summary

Introduction

In the last decade short-read sequencing (SRS) approaches, such as whole exome sequencing (WES) and more recently whole genome sequencing (WGS), have revolutionized the field of medical genetics. It has been shown that, due to technical limitations, SRS approaches often lack sensitivity and specificity for a large proportion of structural variants (SVs) [5,6,7]. These limitations can be overcome by long-read sequencing (LRS). LRS of a haploid human genome resulted in SV call-sets three to sevenfold larger than those produced by standard SRS studies such as the 1000 genomes project [12]. This makes SVs an even more important source of human genome variation than anticipated so far, accounting for the greatest number of divergent bases across the human genome [13, 14]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call