Abstract
BackgroundModern sequencing technologies should make the assembly of the relatively small mitochondrial genomes an easy undertaking. However, few tools exist that address mitochondrial assembly directly.ResultsAs part of the Vertebrate Genomes Project (VGP) we develop mitoVGP, a fully automated pipeline for similarity-based identification of mitochondrial reads and de novo assembly of mitochondrial genomes that incorporates both long (> 10 kbp, PacBio or Nanopore) and short (100–300 bp, Illumina) reads. Our pipeline leads to successful complete mitogenome assemblies of 100 vertebrate species of the VGP. We observe that tissue type and library size selection have considerable impact on mitogenome sequencing and assembly. Comparing our assemblies to purportedly complete reference mitogenomes based on short-read sequencing, we identify errors, missing sequences, and incomplete genes in those references, particularly in repetitive regions. Our assemblies also identify novel gene region duplications. The presence of repeats and duplications in over half of the species herein assembled indicates that their occurrence is a principle of mitochondrial structure rather than an exception, shedding new light on mitochondrial genome evolution and organization.ConclusionsOur results indicate that even in the “simple” case of vertebrate mitogenomes the completeness of many currently available reference sequences can be further improved, and caution should be exercised before claiming the complete assembly of a mitogenome, particularly from short reads alone.
Highlights
Modern sequencing technologies should make the assembly of the relatively small mitochondrial genomes an easy undertaking
The Vertebrate Genomes Project (VGP) version 1 assembly pipeline developed for the nuclear genome uses Continuous Long Reads (CLR) and the Pacific Biosciences (PacBio) assembler FALCON to generate contigs [35, 36]
MitoVGP uses bait-reads to fish out the mitogenome long reads from wholegenome sequencing (WGS) data, assembles complete gapless mitogenome contigs, and polishes for base accuracy using short reads (Additional file 1: Fig. S1; workflow described in the “Methods”)
Summary
Modern sequencing technologies should make the assembly of the relatively small mitochondrial genomes an easy undertaking. MtDNA varies from 14 to over 20 kbp in size, and albeit gene order can vary [5, 6], its gene content is highly conserved [2]. It usually contains 37 genes, encoding for 2 ribosomal RNAs (rRNAs), 13 proteins, and 22 transfer RNAs (tRNAs). This “mitogenome” generally has short repetitive non-coding sequences, normally within a single control region (CR). Relatively large repetitive regions, potentially heteroplasmic, have been reported whose biological significance is still unclear [6, 7]
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.