BackgroundThe ambient temperature of all habitats is a key physical property that shapes the biology of microbes inhabiting them. The optimal growth temperature (OGT) of a microbe, is therefore a key piece of data needed to understand evolutionary adaptations manifested in their genome sequence. Unfortunately there is no growth temperature database or easily downloadable dataset encompassing the majority of cultured microorganisms. We are thus limited in interpreting genomic data to identify temperature adaptations in microbes.ResultsIn this work I significantly contribute to closing this gap by mining data from major culture collection centres to obtain growth temperature data for a nonredundant set of 21,498 microbes. The dataset (https://doi.org/10.5281/zenodo.1175608) contains mainly bacteria and archaea and spans psychrophiles, mesophiles, thermophiles and hyperthermophiles. Using this data a full 43% of all protein entries in the UniProt database can be annotated with the growth temperature of the species from which they originate. I validate the dataset by showing a Pearson correlation of up to 0.89 between growth temperature and mean enzyme optima, a physiological property directly influenced by the growth temperature. Using the temperature dataset I correlate the genomic occurance of enzyme functional annotations with growth temperature. I identify 319 enzyme functions that either increase or decrease in occurrence with temperature. Eight metabolic pathways were statistically enriched for these enzyme functions. Furthermore, I establish a correlation between 33 domains of unknown function (DUFs) with growth temperature in microbes, four of which (DUF438, DUF1524, DUF1957 and DUF3458_C) were significant in both archaea and bacteria.ConclusionsThe growth temperature dataset enables large-scale correlation analysis with enzyme function- and domain-level annotations. Growth-temperature dependent changes in their occurrence highlight potential evolutionary adaptations. A few of the identified changes are previously known, such as the preference for menaquinone biosynthesis through the futalosine pathway in bacteria growing at high temperatures. Others represent important starting points for future studies, such as DUFs where their occurrence change with temperature. The growth temperature dataset should become a valuable community resource and will find additional, important, uses in correlating genomic, transcriptomic, proteomic, metabolomic, phenotypic or taxonomic properties with temperature in future studies.
Read full abstract