Abstract

Sequence assembly of large and repeat-rich plant genomes has been challenging, requiring substantial computational resources and often several complementary sequence assembly and genome mapping approaches. The recent development of fast and accurate long-read sequencing by circular consensus sequencing (CCS) on the PacBio platform may greatly increase the scope of plant pan-genome projects. Here, we compare current long-read sequencing platforms regarding their ability to rapidly generate contiguous sequence assemblies in pan-genome studies of barley (Hordeum vulgare). Most long-read assemblies are clearly superior to the current barley reference sequence based on short-reads. Assemblies derived from accurate long reads excel in most metrics, but the CCS approach was the most cost-effective strategy for assembling tens of barley genomes. A downsampling analysis indicated that 20-fold CCS coverage can yield very good sequence assemblies, while even five-fold CCS data may capture the complete sequence of most genes. We present an updated reference genome assembly for barley with near-complete representation of the repeat-rich intergenic space. Long-read assembly can underpin the construction of accurate and complete sequences of multiple genomes of a species to build pan-genome infrastructures in Triticeae crops and their wild relatives.

Highlights

  • The first plant genome projects, despite focusing on model plants with small genomes, were undertaken by international consortia on the time scale of one decade (The Arabidopsis Genome, 2000; Sasaki and International Rice Genome Sequencing, 2005)

  • We focused on the different pseudomolecule version: Morex V1; Morex V2 (WGS short-read); and Morex V3

  • Hybrid approaches, where short-read assembly are scaffolded with long reads, are not worthwhile as the added cost and effort for obtaining and utilizing short-reads does not translate into improvements of assembly quality

Read more

Summary

Introduction

The first plant genome projects, despite focusing on model plants with small genomes, were undertaken by international consortia on the time scale of one decade (The Arabidopsis Genome, 2000; Sasaki and International Rice Genome Sequencing, 2005). Suppressed recombination within large parts of their genomes makes genetic mapping an ineffective tool for arranging sequence scaffolds along the chromosomes Owing to these challenges, chromosome-scale wholegenome shotgun (WGS) assemblies of the wheat and barley genomes were only published in the last three years (The International Wheat Genome Sequencing Consortium (IWGSC), 2018; Monat et al, 2019). A crucial technological advance enabling WGS assembly from short-reads in Triticeae crops was the development of an effective protocol for iterative scaffolding of primary contigs using mate-pair libraries, 10X Chromium linked reads and chromosome-conformation capture sequencing (Hi-C) to arrange sequence scaffolds into chromosomal pseudomolecules (Avni et al, 2017; Maccaferri et al, 2019; Monat et al, 2019)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call