Abstract

Pseudomonas is a highly versatile genus containing species that can be harmful to humans and plants while others are widely used for bioengineering and bioremediation. We analysed 432 sequenced Pseudomonas strains by integrating results from a large scale functional comparison using protein domains with data from six metabolic models, nearly a thousand transcriptome measurements and four large scale transposon mutagenesis experiments. Through heterogeneous data integration we linked gene essentiality, persistence and expression variability. The pan-genome of Pseudomonas is closed indicating a limited role of horizontal gene transfer in the evolutionary history of this genus. A large fraction of essential genes are highly persistent, still non essential genes represent a considerable fraction of the core-genome. Our results emphasize the power of integrating large scale comparative functional genomics with heterogeneous data for exploring bacterial diversity and versatility.

Highlights

  • Varies in quality due to the use of different databases and annotation pipelines that include different methods and may assign different names, acronyms and aliases to the same protein

  • The number of exact matches in gene start-sites is only 73% (4073 genes) confirming previous observations[10]. These 5′variations in gene identification can result in a putative gain or loss of biological functions; since different naming conventions are used in the different annotation protocols applied, a direct functional comparison to spot possible differences is not possible (Fig. 1)

  • We considered six genome scale constraint based metabolic models describing the metabolism of P. aeruginosa PAO1, P. fluorescens SBW25 and P. putida KT2440

Read more

Summary

Introduction

Varies in quality due to the use of different databases and annotation pipelines that include different methods and may assign different names, acronyms and aliases to the same protein. Re-interpretation of these predictions in most cases requires reverse engineering as data provenance is usually not available. In this paper 432 Pseudomonas genome sequences were de novo re-annotated and the generated annotation information was integrated through a semantic platform with data from six metabolic models, nearly a thousand transcriptome measurements and four large scale transposon mutagenesis experiments. We identified phylogenetic relationships among different species using protein domains and performed extensive analysis of the core- and pan-genomes of the Pseudomonas genus and considered the habitat factor while analyzing the pan/ core-genome. We linked domain content and domain variability of persistent and essential genes and their transcriptional regulation

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call