Abstract

The study of microbial communities is hampered by the large fraction of still unknown bacteria. However, many of these species have been isolated, yet lack a validly published name or description. The validation of names for novel bacteria requires that the uniqueness of those taxa is demonstrated and their properties are described. The accepted format for this is the protologue, which can be time-consuming to create. Hence, many research fields in microbiology and biotechnology will greatly benefit from new approaches that reduce the workload and harmonise the generation of protologues.We have developed Protologger, a bioinformatic tool that automatically generates all the necessary readouts for writing a detailed protologue. By producing multiple taxonomic outputs, functional features and ecological analysis using the 16S rRNA gene and genome sequences from a single species, the time needed to gather the information for describing novel taxa is substantially reduced. The usefulness of Protologger was demonstrated by using three published isolate collections to describe 34 novel taxa, encompassing 17 novel species and 17 novel genera, including the automatic generation of ecologically and functionally relevant names. We also highlight the need to utilise multiple taxonomic delineation methods, as while inconsistencies between each method occur, a combined approach provides robust placement. Protologger is open source; all scripts and datasets are available, along with a webserver at www.protologger.de

Highlights

  • The recent renaissance of cultivation has led to >500 novel species being added to the ‘List of Prokaryotic names with Standing in Nomenclature’ (LPSN) database every year since 20051.This has included large-scale cultivation projects of host-associated microbial communities[2,3,4,5,6,7,8,9] as well as environmental sources, such as soil[10] and the ocean[11,12]

  • Taxonomic assignment is conducted via identification of the 50 closest relatives within the SILVA Living Tree Project based on 16S rRNA gene sequence identity

  • Species with validly published names according to the DSMZ nomenclature list, supplemented with updates from LPSN, have their type genomes obtained from the Genome Taxonomy DataBase (GTDB) database and used to calculate genomebased delineation values: average nucleotide identity (ANI), percentage of conserved proteins (POCP), and differences in the G + C content of genomic DNA

Read more

Summary

Introduction

The recent renaissance of cultivation has led to >500 novel species being added to the ‘List of Prokaryotic names with Standing in Nomenclature’ (LPSN) database every year since 20051.This has included large-scale cultivation projects of host-associated microbial communities[2,3,4,5,6,7,8,9] as well as environmental sources, such as soil[10] and the ocean[11,12]. MAGs provide an invaluable background of potentially novel taxa, on to which cultured isolates can be compared, strengthening the justification of creating high taxonomic groups, e.g., families[17]. One application of this method is GTDB-Tk18, a state-of-the-art resource which utilises the genomes of both isolates and MAGs to place queried genomes within the currently sequenced space of taxa. By expanding the taxonomic and genomic landscape, MAGs have facilitated detailed analysis of both described and undescribed taxonomic groups[19,20,21]

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call