Abstract

BackgroundWe benchmarked sequencing technology and assembly strategies for short-read, long-read, and hybrid assemblers in respect to correctness, contiguity, and completeness of assemblies in genomes of Francisella tularensis. Benchmarking allowed in-depth analyses of genomic structures of the Francisella pathogenicity islands and insertion sequences. Five major high-throughput sequencing technologies were applied, including next-generation “short-read” and third-generation “long-read” sequencing methods.ResultsWe focused on short-read assemblers, hybrid assemblers, and analysis of the genomic structure with particular emphasis on insertion sequences and the Francisella pathogenicity island. The A5-miseq pipeline performed best for MiSeq data, Mira for Ion Torrent data, and ABySS for HiSeq data from eight short-read assembly methods. Two approaches were applied to benchmark long-read and hybrid assembly strategies: long-read-first assembly followed by correction with short reads (Canu/Pilon, Flye/Pilon) and short-read-first assembly along with scaffolding based on long reads (Unicyler, SPAdes). Hybrid assembly can resolve large repetitive regions best with a “long-read first” approach.ConclusionsGenomic structures of the Francisella pathogenicity islands frequently showed misassembly. Insertion sequences (IS) could be used to perform an evolutionary conservation analysis. A phylogenetic structure of insertion sequences and the evolution within the clades elucidated the clade structure of the highly conservative F. tularensis.

Highlights

  • We benchmarked sequencing technology and assembly strategies for short-read, long-read, and hybrid assemblers in respect to correctness, contiguity, and completeness of assemblies in genomes of Francisella tularensis

  • We evaluated five independent sequencing technologies: Illumina MiSeq (MiSeq), Illumina HiSeq (HiSeq, and Ion Torrent’s Ion S5XL (Ion Torrent’s) to generate short-read sequences; Pacific Biosciences RS (PacBio) and Oxford Nanopore Technologies MinION (MinION) for long-read sequences

  • 99.8% of reads could be mapped to the respective reference genome resulting in 99.2–99.7% of bases covered with 78-140x on average

Read more

Summary

Introduction

We benchmarked sequencing technology and assembly strategies for short-read, long-read, and hybrid assemblers in respect to correctness, contiguity, and completeness of assemblies in genomes of Francisella tularensis. Benchmarking allowed in-depth analyses of genomic structures of the Francisella pathogenicity islands and insertion sequences. It is considered a potential biological agent. For phylogenetic studies and outbreak analyses, high-quality reference genomes are needed [3, 4]. Studies of the genomic structure of Francisella, such as pathogenicity islands and insertion sequences, allowed new insights into the development of the species. Insertion sequences (IS elements) are transposable elements, which code only for transposition activity and can occur in different copy numbers and positions within the genome [5, 6]. IS elements constitute genomic rearrangement events during evolution that are correlated to pathogenicity [7,8,9]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call