Abstract

Whole Genome Shotgun (WGS) sequences of plant species often contain an abundance of reads that are derived from the chloroplast genome. Up to now these reads have generally been identified and assembled into chloroplast genomes based on homology to chloroplasts from related species. This re-sequencing approach may select against structural differences between the genomes especially in non-model species for which no close relatives have been sequenced before. The alternative approach is to de novo assemble the chloroplast genome from total genomic DNA sequences. In this study, we used k-mer frequency tables to identify and extract the chloroplast reads from the WGS reads and assemble these using a highly integrated and automated custom pipeline. Our strategy includes steps aimed at optimizing assemblies and filling gaps which are left due to coverage variation in the WGS dataset. We have successfully de novo assembled three complete chloroplast genomes from plant species with a range of nuclear genome sizes to demonstrate the universality of our approach: Solanum lycopersicum (0.9 Gb), Aegilops tauschii (4 Gb) and Paphiopedilum henryanum (25 Gb). We also highlight the need to optimize the choice of k and the amount of data used. This new and cost-effective method for de novo short read assembly will facilitate the study of complete chloroplast genomes with more accurate analyses and inferences, especially in non-model plant genomes.

Highlights

  • Chloroplast genomes are frequently used in systematics and phylogeography because of the simplicity of the structure of its circular genome, its predominantly clonal inheritance along the maternal line, as well its high copy number in the cell (Palmer and Stein, 1986; Moore et al, 2006; Ma et al, 2013)

  • Whole genome paired-end sequences of Solanum lycopersicum and Aegilops tauschii were downloaded from the sequence read archive of Genbank1

  • The chloroplast genome certainly is a great resource of molecular markers in many studies including parentage analysis, hybridization, population and genetic structure and phylogeography

Read more

Summary

Introduction

Chloroplast genomes are frequently used in systematics and phylogeography because of the simplicity of the structure of its circular genome, its predominantly clonal inheritance along the maternal line, as well its high copy number in the cell (Palmer and Stein, 1986; Moore et al, 2006; Ma et al, 2013). The chloroplast genome is often perceived to have a low amount of. De novo Chloroplast Genome Assembly sequence variation, and the use of the genome has been mostly confined to studies at the interspecific and interfamilial levels (Jansen et al, 2007; Moore et al, 2007; Xi et al, 2012; Barrett et al, 2013). Comparative analyses of complete chloroplast sequences showed that the perception of low variation of chloroplasts within species is wrong when looking at the genomic scale (Whittall et al, 2010; Besnard et al, 2011; Kane et al, 2012). Using the complete chloroplast genome will undoubtedly be the best way to exploit the information in this organelle genome

Methods
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call