Abstract

The PubMLST.org website hosts a collection of open-access, curated databases that integrate population sequence data with provenance and phenotype information for over 100 different microbial species and genera. Although the PubMLST website was conceived as part of the development of the first multi-locus sequence typing (MLST) scheme in 1998 the software it uses, the Bacterial Isolate Genome Sequence database (BIGSdb, published in 2010), enables PubMLST to include all levels of sequence data, from single gene sequences up to and including complete, finished genomes. Here we describe developments in the BIGSdb software made from publication to June 2018 and show how the platform realises microbial population genomics for a wide range of applications. The system is based on the gene-by-gene analysis of microbial genomes, with each deposited sequence annotated and curated to identify the genes present and systematically catalogue their variation. Originally intended as a means of characterising isolates with typing schemes, the synthesis of sequences and records of genetic variation with provenance and phenotype data permits highly scalable (whole genome sequence data for tens of thousands of isolates) means of addressing a wide range of functional questions, including: the prediction of antimicrobial resistance; likely cross-reactivity with vaccine antigens; and the functional activities of different variants that lead to key phenotypes. There are no limitations to the number of sequences, genetic loci, allelic variants or schemes (combinations of loci) that can be included, enabling each database to represent an expanding catalogue of the genetic variation of the population in question. In addition to providing web-accessible analyses and links to third-party analysis and visualisation tools, the BIGSdb software includes a RESTful application programming interface (API) that enables access to all the underlying data for third-party applications and data analysis pipelines.

Highlights

  • Our ability to study complex phenotypes, i.e. those that depend on the interactions of multiple components of an organism and its environment, have been enhanced during the past 20 years by the very large increases in our capacity to collect and analyse biological information

  • The PubMLST website was conceived as part of the development of the first multi-locus sequence typing (MLST) scheme in 1998 the software it uses, the Bacterial Isolate Genome Sequence database (BIGSdb, published in 2010), enables PubMLST to include all levels of sequence data, from single gene sequences up to and including complete, finished genomes

  • We describe developments in the Bacterial Isolate Genome Sequence Database (BIGSdb) software made from publication to June 2018 and show how the platform realises microbial population genomics for a wide range of applications

Read more

Summary

Introduction

Our ability to study complex phenotypes, i.e. those that depend on the interactions of multiple components of an organism and its environment, have been enhanced during the past 20 years by the very large increases in our capacity to collect and analyse biological information. Amongst the most important of these developments have been very high-throughput sequencing methods and the informatics approaches required to interpret the large volumes of data that they generate; at the time of writing, there remain major challenges in realising the potential of the opportunities presented by such developments[1]. These data must be stored, organised, curated, interpreted, analysed, and disseminated in a usable way. The gene-by-gene approach exemplified by MLST is inherently scalable with respect to the number of loci and individual organisms included[16] and the BIGSdb platform has been continually developed and extended to provide additional functionality

Methods
Findings
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call