Abstract
Generating a complete, de novo genome assembly for prokaryotes is often considered a solved problem. However, we here show that Pseudomonas koreensis P19E3 harbors multiple, near identical repeat pairs up to 70 kilobase pairs in length, which contained several genes that may confer fitness advantages to the strain. Its complex genome, which also included a variable shufflon region, could not be de novo assembled with long reads produced by Pacific Biosciences’ technology, but required very long reads from Oxford Nanopore Technologies. Importantly, a repeat analysis, whose results we release for over 9600 prokaryotes, indicated that very complex bacterial genomes represent a general phenomenon beyond Pseudomonas. Roughly 10% of 9331 complete bacterial and a handful of 293 complete archaeal genomes represented this ‘dark matter’ for de novo genome assembly of prokaryotes. Several of these ‘dark matter’ genome assemblies contained repeats far beyond the resolution of the sequencing technology employed and likely contain errors, other genomes were closed employing labor-intense steps like cosmid libraries, primer walking or optical mapping. Using very long sequencing reads in combination with assembly algorithms capable of resolving long, near identical repeats will bring most prokaryotic genomes within reach of fast and complete de novo genome assembly.
Highlights
The enormous pace in generation sequencing (NGS) technology development [1] has led to an exponential increase in the number of publicly available, complete prokaryotic genome assemblies [2]
Despite advances from Pacific Biosciences (PacBio) and more recently Oxford Nanopore Technologies (ONT) to sequence very long reads (>15 kb and well beyond) which allow de novo bacterial genome assembly [3], the percentage of complete genomes is still low compared to the large number of fragmented assemblies based on Illumina short reads [2,4], most of which remain at a permanent draft stage
A bacterial strain isolated during a screening of herbal plants for food spoiling and pathogenic bacteria was assigned to the species P. koreensis by MALDI biotyping [27]
Summary
The enormous pace in generation sequencing (NGS) technology development [1] has led to an exponential increase in the number of publicly available, complete prokaryotic genome assemblies [2]. Despite advances from Pacific Biosciences (PacBio) and more recently Oxford Nanopore Technologies (ONT) to sequence very long reads (>15 kb and well beyond) which allow de novo bacterial genome assembly [3], the percentage of complete genomes is still low compared to the large number of fragmented assemblies based on Illumina short reads [2,4], most of which remain at a permanent draft stage. This can in part be attributed to a considerable lag between the development of a new technology and its broader adoption, and to the higher costs of PacBio and ONT.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.