Abstract

BackgroundWhole-genome shotgun sequencing, which stitches together millions of short sequencing reads into a single genome, ushered in the era of modern genomics and led to a rapid expansion of the number of genome sequences available. Nevertheless, assembly of short reads remains difficult, resulting in fragmented genome sequences. Ultimately, only a sequencing technology capable of capturing complete chromosomes in a single run could resolve all ambiguities. Even “third generation” sequencing technologies produce reads far shorter than most eukaryotic chromosomes. However, the ciliate Oxytricha trifallax has a somatic genome with thousands of chromosomes averaging only 3.2 kbp, making it an ideal candidate for exploring the benefits of sequencing whole chromosomes without assembly.ResultsWe used single-molecule real-time sequencing to capture thousands of complete chromosomes in single reads and to update the published Oxytricha trifallax JRB310 genome assembly. In this version, over 50% of the completed chromosomes with two telomeres derive from single reads. The improved assembly includes over 12,000 new chromosome isoforms, and demonstrates that somatic chromosomes derive from variable rearrangements between somatic segments encoded up to 191,000 base pairs away. However, while long reads reduce the need for assembly, a hybrid approach that supplements long-read sequencing with short reads for error correction produced the most complete and accurate assembly, overall.ConclusionsThis assembly provides the first example of complete eukaryotic chromosomes captured by single sequencing reads and demonstrates that traditional approaches to genome assembly can mask considerable structural variation.

Highlights

  • Whole-genome shotgun sequencing, which stitches together millions of short sequencing reads into a single genome, ushered in the era of modern genomics and led to a rapid expansion of the number of genome sequences available

  • Whole-genome shotgun sequencing, first pioneered in eukaryotes during the human genome project, has become such common practice that over 38,000 genome assemblies are available from NCBI today [1]

  • The higher resolution provided by these long reads has made it possible to produce high-quality reference sequences that capture structural variation that short-read sequencing cannot resolve [7, 8] and even automate the completion of microbial genomes [9]

Read more

Summary

Introduction

Whole-genome shotgun sequencing, which stitches together millions of short sequencing reads into a single genome, ushered in the era of modern genomics and led to a rapid expansion of the number of genome sequences available. The higher resolution provided by these long reads has made it possible to produce high-quality reference sequences that capture structural variation that short-read sequencing cannot resolve [7, 8] and even automate the completion of microbial genomes [9]. While the germline genome contains hundreds of long chromosomes, the somatic genome is highly fragmented with ~ 20,000 different chromosomes that average just 3.2 kb in length [13, 14], possess very few well-positioned nucleosomes [10], and derive from a copy of the germline through an elaborate process of RNA-guided genome rearrangement that eliminates 90– 95% of the germline sequence, including all IESs, stitches together the remaining germline segments in the correct order [15, 16], and adds telomeres to chromosome ends (reviewed in Yerlici and Landweber [17])

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call