Abstract
Genome sequencing is rapidly being adopted in reference labs and hospitals for bacterial outbreak investigation and diagnostics where time is critical. Seven gene multi-locus sequence typing is a standard tool for broadly classifying samples into sequence types (STs), allowing, in many cases, to rule a sample out of an outbreak, or allowing for general characteristics about a bacterial strain to be inferred. Long-read sequencing technologies, such as from Oxford Nanopore, can produce read data within minutes of an experiment starting, unlike short-read sequencing technologies which require many hours/days. However, the error rates of raw uncorrected long read data are very high. We present Krocus which can predict a ST directly from uncorrected long reads, and which was designed to consume read data as it is produced, providing results in minutes. It is the only tool which can do this from uncorrected long reads. We tested Krocus on over 700 isolates sequenced using long-read sequencing technologies from Pacific Biosciences and Oxford Nanopore. It provides STs for isolates on average within 90 s, with a sensitivity of 94% and specificity of 97% on real sample data, directly from uncorrected raw sequence reads. The software is written in Python and is available under the open source license GNU GPL version 3.
Highlights
With rapidly falling costs, long-read sequencing technologies, such as from Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT), are beginning to be used for outbreak investigations (Faria et al, 2017; Quick et al, 2015) and for rapid clinical diagnostics (Votintseva et al, 2017)
The multi-locus sequence typing (MLST) alleles are contained in seven FASTA files, downloaded from PubMLST (Jolley & Maiden, 2010) or taken from the set distributed with the software
PacBio results Each of the assemblies from the NCTC dataset were run through TS-mlst to generate a sequence types (STs)
Summary
Long-read sequencing technologies, such as from Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT), are beginning to be used for outbreak investigations (Faria et al, 2017; Quick et al, 2015) and for rapid clinical diagnostics (Votintseva et al, 2017). Long-read sequencers from Oxford Nanopore can produce sequence reads in a matter of minutes and sequencers from PacBio can produce sequences in a number of hours compared to short-read sequencing technologies which takes hours/days. Seven gene multi-locus sequence typing (MLST) is a widely used classification system for categorising bacteria. It can be used to quickly rule an isolate out of an outbreak and knowing a sequence type (ST) can often allow for general characteristics of a bacteria to be inferred. How to cite this article Page and Keane (2018), Rapid multi-locus sequence typing direct from uncorrected long reads using Krocus.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.