Abstract

(1) Background: Short-read sequencing allows for the rapid and accurate analysis of the whole bacterial genome but does not usually enable complete genome assembly. Long-read sequencing greatly assists with the resolution of complex bacterial genomes, particularly when combined with short-read Illumina data. However, it is not clear how different assembly strategies affect genomic accuracy, completeness, and protein prediction. (2) Methods: we compare different assembly strategies for Haemophilus parasuis, which causes Glässer’s disease, characterized by fibrinous polyserositis and arthritis, in swine by using Illumina sequencing and long reads from the sequencing platforms of either Oxford Nanopore Technologies (ONT) or SMRT Pacific Biosciences (PacBio). (3) Results: Assembly with either PacBio or ONT reads, followed by polishing with Illumina reads, facilitated high-quality genome reconstruction and was superior to the long-read-only assembly and hybrid-assembly strategies when evaluated in terms of accuracy and completeness. An equally excellent method was correction with Homopolish after the ONT-only assembly, which had the advantage of avoiding hybrid sequencing with Illumina. Furthermore, by aligning transcripts to assembled genomes and their predicted CDSs, the sequencing errors of the ONT assembly were mainly indels that were generated when homopolymer regions were sequenced, thus critically affecting protein prediction. Polishing can fill indels and correct mistakes. (4) Conclusions: The assembly of bacterial genomes can be directly achieved by using long-read sequencing techniques. To maximize assembly accuracy, it is essential to polish the assembly with homologous sequences of related genomes or sequencing data from short-read technology.

Highlights

  • Second-generation sequencing (SGS) platforms, such as Illumina, have significant limitations, they are widely used in bacterial-genome research [1]

  • For the Pacific Biosciences (PacBio) SMRT sequencing data, we obtained reads with a depth of about 50×

  • The H. parasuis genome could be assembled de novo with a read depth of about

Read more

Summary

Introduction

Second-generation sequencing (SGS) platforms, such as Illumina, have significant limitations, they are widely used in bacterial-genome research [1]. For a single laboratory, an SGS device requires significant capital investments 980,000), and the operation of the instrument has strict requirements in terms of the laboratory environment and the operators’ skills. The process of SGS sample preparation and library construction is cumbersome. The sequencing process is time-consuming, and the output lags. SGS platforms are based on PCR amplification for carrying out DNA molecule sequencing. They are limited by amplification and sequencing bias complications, and the read lengths are restricted to a few hundred bases

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call