Abstract

The size of the chloroplast genome (plastome) of autotrophic angiosperms is generally conserved. However, the chloroplast genomes of some lineages are greatly expanded, which may render assembling these genomes from short read sequencing data more challenging. Here, we present the sequencing, assembly, and annotation of the chloroplast genomes of Cypripedium tibeticum and Cypripedium subtropicum. We de novo assembled the chloroplast genomes of the two species with a combination of short-read Illumina data and long-read PacBio data. The plastomes of the two species are characterized by expanded genome size, proliferated AT-rich repeat sequences, low GC content and gene density, as well as low substitution rates of the coding genes. The plastomes of C. tibeticum (197,815 bp) and C. subtropicum (212,668 bp) are substantially larger than those of the three species sequenced in previous studies. The plastome of C. subtropicum is the longest one of Orchidaceae to date. Despite the increase in genome size, the gene order and gene number of the plastomes are conserved, with the exception of an ∼75 kb large inversion in the large single copy (LSC) region shared by the two species. The most striking is the record-setting low GC content in C. subtropicum (28.2%). Moreover, the plastome expansion of the two species is strongly correlated with the proliferation of AT-biased non-coding regions: the non-coding content of C. subtropicum is in excess of 57%. The genus provides a typical example of plastome expansion induced by the expansion of non-coding regions. Considering the pros and cons of different sequencing technologies, we recommend hybrid assembly based on long and short reads applied to the sequencing of plastomes with AT-biased base composition.

Highlights

  • The average chloroplast genome size of land plants is 151 kb, with most species ranging from 130–170 kb in length, and the average GC content is 36.3% (NCBI database, 4,281 land plant plastomes, March 17, 2020) (Supplementary Table 1)

  • The plastid genomes of the two species showed typical quadripartite structure, with two identical copies of the inverted repeat (IR) region separated by an large single copy (LSC) region and a small single copy (SSC) region (Figures 1, 2)

  • The low substitution rates might explain the unresolved relationships among sections in Cypripedium (Li et al, 2011)

Read more

Summary

Introduction

The average chloroplast genome (plastome) size of land plants is 151 kb, with most species ranging from 130–170 kb in length, and the average GC content is 36.3% (NCBI database, 4,281 land plant plastomes, March 17, 2020) (Supplementary Table 1). Wang et al (2018) compared shortread (Illumina) data only assembly, long-read (Oxford nanopore) data only assembly, and hybrid assembly involving short- and long-read data to test the accuracy of chloroplast genome assembly. They suggested that hybrid assembly provides highly accurate and complete chloroplast genome assembly

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call