Rapid multi-locus sequence typing direct from uncorrected long reads using Krocus.

Andrew J Page,Jacqueline A Keane

doi:10.7717/peerj.5233

Andrew J Page, Jacqueline A Keane

Open Access

https://doi.org/10.7717/peerj.5233

Copy DOI

Abstract

Genome sequencing is rapidly being adopted in reference labs and hospitals for bacterial outbreak investigation and diagnostics where time is critical. Seven gene multi-locus sequence typing is a standard tool for broadly classifying samples into sequence types (STs), allowing, in many cases, to rule a sample out of an outbreak, or allowing for general characteristics about a bacterial strain to be inferred. Long-read sequencing technologies, such as from Oxford Nanopore, can produce read data within minutes of an experiment starting, unlike short-read sequencing technologies which require many hours/days. However, the error rates of raw uncorrected long read data are very high. We present Krocus which can predict a ST directly from uncorrected long reads, and which was designed to consume read data as it is produced, providing results in minutes. It is the only tool which can do this from uncorrected long reads. We tested Krocus on over 700 isolates sequenced using long-read sequencing technologies from Pacific Biosciences and Oxford Nanopore. It provides STs for isolates on average within 90 s, with a sensitivity of 94% and specificity of 97% on real sample data, directly from uncorrected raw sequence reads. The software is written in Python and is available under the open source license GNU GPL version 3.

Highlights

With rapidly falling costs, long-read sequencing technologies, such as from Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT), are beginning to be used for outbreak investigations (Faria et al, 2017; Quick et al, 2015) and for rapid clinical diagnostics (Votintseva et al, 2017)
The multi-locus sequence typing (MLST) alleles are contained in seven FASTA files, downloaded from PubMLST (Jolley & Maiden, 2010) or taken from the set distributed with the software
PacBio results Each of the assemblies from the NCTC dataset were run through TS-mlst to generate a sequence types (STs)

Summary

Introduction

Long-read sequencing technologies, such as from Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT), are beginning to be used for outbreak investigations (Faria et al, 2017; Quick et al, 2015) and for rapid clinical diagnostics (Votintseva et al, 2017). Long-read sequencers from Oxford Nanopore can produce sequence reads in a matter of minutes and sequencers from PacBio can produce sequences in a number of hours compared to short-read sequencing technologies which takes hours/days. Seven gene multi-locus sequence typing (MLST) is a widely used classification system for categorising bacteria. It can be used to quickly rule an isolate out of an outbreak and knowing a sequence type (ST) can often allow for general characteristics of a bacteria to be inferred. How to cite this article Page and Keane (2018), Rapid multi-locus sequence typing direct from uncorrected long reads using Krocus.

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PeerJ	Publication Date: Jul 31, 2018
Citations: 24	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Rapid multi-locus sequence typing direct from uncorrected long reads using Krocus.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PeerJ

Lead the way for us

Similar Papers

Advancements in long-read genome sequencing technologies and algorithms
Elena Espinosa ... Oscar Plata
Genomics | VOL. 116
Elena Espinosa, et. al.Elena Espinosa ... Oscar Plata
11 Apr 2024
Genomics | VOL. 116

Machine learning to predict the source of campylobacteriosis using whole genome data.
Nicolas Arning ... Sion Bayliss
PLOS Genetics | VOL. 17
Nicolas Arning, et. al.Nicolas Arning ... Sion Bayliss
18 Oct 2021
PLOS Genetics | VOL. 17

NanoNIPT: Short-fragment nanopore sequencing of cell-free DNA for non-invasive prenatal testing of fetal aneuploidies and sex chromosome aberrations.
Maria Winther Jørgensen ... Martin J Larsen
Prenatal diagnosis | VOL. 43
Maria Winther Jørgensen, et. al.Maria Winther Jørgensen ... Martin J Larsen
19 Feb 2023
Prenatal diagnosis | VOL. 43

MLST based serotype prediction for the accurate identification of non typhoidal Salmonella serovars.
Jobin John Jacob ... Tanya Rachel
Molecular Biology Reports | VOL. 47
Jobin John Jacob, et. al.Jobin John Jacob ... Tanya Rachel
01 Oct 2020
Molecular Biology Reports | VOL. 47

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Rapid multi-locus sequence typing direct from uncorrected long reads using Krocus.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PeerJ