Abstract
BackgroundWe benchmarked sequencing technology and assembly strategies for short-read, long-read, and hybrid assemblers in respect to correctness, contiguity, and completeness of assemblies in genomes of Francisella tularensis. Benchmarking allowed in-depth analyses of genomic structures of the Francisella pathogenicity islands and insertion sequences. Five major high-throughput sequencing technologies were applied, including next-generation “short-read” and third-generation “long-read” sequencing methods.ResultsWe focused on short-read assemblers, hybrid assemblers, and analysis of the genomic structure with particular emphasis on insertion sequences and the Francisella pathogenicity island. The A5-miseq pipeline performed best for MiSeq data, Mira for Ion Torrent data, and ABySS for HiSeq data from eight short-read assembly methods. Two approaches were applied to benchmark long-read and hybrid assembly strategies: long-read-first assembly followed by correction with short reads (Canu/Pilon, Flye/Pilon) and short-read-first assembly along with scaffolding based on long reads (Unicyler, SPAdes). Hybrid assembly can resolve large repetitive regions best with a “long-read first” approach.ConclusionsGenomic structures of the Francisella pathogenicity islands frequently showed misassembly. Insertion sequences (IS) could be used to perform an evolutionary conservation analysis. A phylogenetic structure of insertion sequences and the evolution within the clades elucidated the clade structure of the highly conservative F. tularensis.
Highlights
We benchmarked sequencing technology and assembly strategies for short-read, long-read, and hybrid assemblers in respect to correctness, contiguity, and completeness of assemblies in genomes of Francisella tularensis
We evaluated five independent sequencing technologies: Illumina MiSeq (MiSeq), Illumina HiSeq (HiSeq, and Ion Torrent’s Ion S5XL (Ion Torrent’s) to generate short-read sequences; Pacific Biosciences RS (PacBio) and Oxford Nanopore Technologies MinION (MinION) for long-read sequences
99.8% of reads could be mapped to the respective reference genome resulting in 99.2–99.7% of bases covered with 78-140x on average
Summary
We benchmarked sequencing technology and assembly strategies for short-read, long-read, and hybrid assemblers in respect to correctness, contiguity, and completeness of assemblies in genomes of Francisella tularensis. Benchmarking allowed in-depth analyses of genomic structures of the Francisella pathogenicity islands and insertion sequences. It is considered a potential biological agent. For phylogenetic studies and outbreak analyses, high-quality reference genomes are needed [3, 4]. Studies of the genomic structure of Francisella, such as pathogenicity islands and insertion sequences, allowed new insights into the development of the species. Insertion sequences (IS elements) are transposable elements, which code only for transposition activity and can occur in different copy numbers and positions within the genome [5, 6]. IS elements constitute genomic rearrangement events during evolution that are correlated to pathogenicity [7,8,9]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.