Abstract

BackgroundThe development of long-read sequencing technologies, such as single-molecule real-time (SMRT) sequencing by PacBio, has produced a revolution in the sequencing of small genomes. Sequencing organelle genomes using PacBio long-read data is a cost effective, straightforward approach. Nevertheless, the availability of simple-to-use software to perform the assembly from raw reads is limited at present.ResultsWe present Organelle-PBA, a Perl program designed specifically for the assembly of chloroplast and mitochondrial genomes. For chloroplast genomes, the program selects the chloroplast reads from a whole genome sequencing pool, maps the reads to a reference sequence from a closely related species, and then performs read correction and de novo assembly using Sprai. Organelle-PBA completes the assembly process with the additional step of scaffolding by SSPACE-LongRead. The program then detects the chloroplast inverted repeats and reassembles and re-orients the assembly based on the organelle origin of the reference. We have evaluated the performance of the software using PacBio reads from different species, read coverage, and reference genomes. Finally, we present the assembly of two novel chloroplast genomes from the species Picea glauca (Pinaceae) and Sinningia speciosa (Gesneriaceae).ConclusionOrganelle-PBA is an easy-to-use Perl-based software pipeline that was written specifically to assemble mitochondrial and chloroplast genomes from whole genome PacBio reads. The program is available at https://github.com/aubombarely/Organelle_PBA.

Highlights

  • The development of long-read sequencing technologies, such as single-molecule real-time (SMRT) sequencing by Pacific Biosciences (PacBio), has produced a revolution in the sequencing of small genomes

  • Mus musculus mitochondrial genome assembly Sets of 50,000, 100,000, and 163,477 randomly selected PacBio reads from the house mouse, Mus musculus (SRA datatset: ERR731675), were used to test the mitochondrial genome assembly using different mitochondrial reference genomes; M. musculus, NC_005089.1; Mus carolis, NC_025268.1; Rattus norvegicus, NC_001665.2 and Marmota himalayana, NC_018367.1

  • Our analysis shows that an average whole genome sequencing project contains ~0.2% of animal mitochondrial DNA and ~20% of plant chloroplast DNA, which, in most cases, is enough to reach >50X organelle genome coverage

Read more

Summary

Introduction

The development of long-read sequencing technologies, such as single-molecule real-time (SMRT) sequencing by PacBio, has produced a revolution in the sequencing of small genomes. Single Molecule Real Time (SMRT) sequencing technology developed by Pacific Biosciences (PacBio), can produce millions of long reads (1 Kb or longer, with a current average of 12Kb) per run. SMRT sequencing is based on single molecule real-time imaging of the incorporation of fluorescently tagged nucleotides to a DNA template molecule [1]. This technology has been successfully applied to a wide range of experiments and species such as the sequencing of DNA amplicons [2] and transcriptomes [3]. A popular OLC-based program, the Celera Assembler (CA; [7]), has been updated to assemble PacBio reads

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call