Abstract

MLST (multi-locus sequence typing) is a classic technique for genotyping bacteria, widely applied for pathogen outbreak surveillance. Traditionally, MLST is based on identifying sequence types from a small number of housekeeping genes. With the increasing availability of whole-genome sequencing data, MLST methods have evolved towards larger typing schemes, based on a few hundred genes [core genome MLST (cgMLST)] to a few thousand genes [whole genome MLST (wgMLST)]. Such large-scale MLST schemes have been shown to provide a finer resolution and are increasingly used in various contexts such as hospital outbreaks or foodborne pathogen outbreaks. This methodological shift raises new computational challenges, especially given the large size of the schemes involved. Very few available MLST callers are currently capable of dealing with large MLST schemes. We introduce MentaLiST, a new MLST caller, based on a k-mer voting algorithm and written in the Julia language, specifically designed and implemented to handle large typing schemes. We test it on real and simulated data to show that MentaLiST is faster than any other available MLST caller while providing the same or better accuracy, and is capable of dealing with MLST schemes with up to thousands of genes while requiring limited computational resources. MentaLiST source code and easy installation instructions using a Conda package are available at https://github.com/WGS-TB/MentaLiST.

Highlights

  • Since it was introduced by Maiden et al in 1998 [1], multilocus sequence typing (MLST) has become a fundamental technique for classifying bacterial isolates into strains

  • In the specific case of MLST, this has led to the emergence of MLST schemes based on a larger set of genes, such as core genome MLST, that consider the set of core genes shared by a group of related strains, and even whole genome MLST

  • As expected for a traditional MLST scheme, all tested methods made identical calls on all 41 samples, except for SRST2, where on two samples the call for gene ddl was different from the other callers, 11 versus 5 on both cases, and had the flags ‘*?’ indicating mismatches and uncertainty due to a low depth of coverage in certain parts of the gene, according to SRST2 documentation

Read more

Summary

Introduction

Since it was introduced by Maiden et al in 1998 [1], multilocus sequence typing (MLST) has become a fundamental technique for classifying bacterial isolates into strains It has been applied in a large number of contexts, especially related to pathogen outbreak surveillance [2]. Jolley et al showed that traditional MLST schemes were not able to discriminate separate sublineages within a clonal complex of Neisseria meningitidis [4] This observation has come at a time when advances in sequencing technologies and protocols have had a major impact on public health, as it is common to rapidly obtain WGS data from a pathogen outbreak, allowing for monitoring at an unprecedented level of resolution [5,6,7,8,9,10,11,12,13]. In the specific case of MLST, this has led to the emergence of MLST schemes based on a larger set of genes, such as core genome MLST (cgMLST), that consider the set of core genes shared by a group of related strains (generally a few hundred genes), and even whole genome MLST (wgMLST)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call