Abstract

Molecular typing is used to differentiate microorganisms at the subspecies or strain level for epidemiological investigations, infection control, public health and environmental sampling. DNA sequence-based typing methods require authoritative databases that link sequence variants to nomenclature in order to facilitate communication and comparison of identified types in national or global settings. The PubMLST website (https://pubmlst.org/) fulfils this role for over a hundred microorganisms for which it hosts curated molecular sequence typing data, providing sequence and allelic profile definitions for multi-locus sequence typing (MLST) and single-gene typing approaches. In recent years, these have expanded to cover the whole genome with schemes such as core genome MLST (cgMLST) and whole genome MLST (wgMLST) which catalogue the allelic diversity found in hundreds to thousands of genes. These approaches provide a common nomenclature for high-resolution strain characterization and comparison. Molecular typing information is linked to isolate provenance, phenotype, and increasingly genome assemblies, providing a resource for outbreak investigation and research in to population structure, gene association, global epidemiology and vaccine coverage. A Representational State Transfer (REST) Application Programming Interface (API) has been developed for the PubMLST website to make these large quantities of structured molecular typing and whole genome sequence data available for programmatic access by any third party application. The API is an integral component of the Bacterial Isolate Genome Sequence Database (BIGSdb) platform that is used to host PubMLST resources, and exposes all public data within the site. In addition to data browsing, searching and download, the API supports authentication and submission of new data to curator queues. Database URL: http://rest.pubmlst.org/

Highlights

  • The PubMLST website was established in 2003 to host multi-locus sequence typing (MLST) data, initially for the Neisseria [1] and Campylobacter [2] schemes developed at the University of Oxford, its origins date back to the first MLST site established in 1998 [1]

  • Curators validate new sequence data and allelic profiles submitted by end users, ensuring that any new variant is real before being assigned an identifier and that this can be associated with representative isolate information

  • With the advent of routinely available whole genome sequence (WGS) data, the Bacterial Isolate Genome Sequence Database (BIGSdb) platform [5] was developed to host, flexibly organize, and extract allelic variants for any locus of interest: it was designed to be able to store both the genome sequence data as draft or complete assemblies, associated with its provenance metadata, along with any number and size of typing schemes and nomenclatures, ranging from the existing conventional MLST schemes through to collections of loci that make up the core-[e.g. core genome MLST, cgMLST [6,7,8,9,10]] and pangenomes of a species or genus [11]

Read more

Summary

Introduction

The PubMLST website (https://pubmlst.org) was established in 2003 to host multi-locus sequence typing (MLST) data, initially for the Neisseria [1] and Campylobacter [2] schemes developed at the University of Oxford, its origins date back to the first MLST site established in 1998 (http://mlst.zoo.ox.ac.uk) [1]. Curators validate new sequence data and allelic profiles submitted by end users, ensuring that any new variant is real before being assigned an identifier and that this can be associated with representative isolate information.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call