Abstract

Although there have been great advances in understanding bacterial pathogenesis, there is still a lack of integrative information about what makes a bacterium a human pathogen. The advent of high-throughput sequencing technologies has dramatically increased the amount of completed bacterial genomes, for both known human pathogenic and non-pathogenic strains; this information is now available to investigate genetic features that determine pathogenic phenotypes in bacteria. In this work we determined presence/absence patterns of different virulence-related genes among more than finished bacterial genomes from both human pathogenic and non-pathogenic strains, belonging to different taxonomic groups (i.e: Actinobacteria, Gammaproteobacteria, Firmicutes, etc.). An accuracy of 95% using a cross-fold validation scheme with in-fold feature selection is obtained when classifying human pathogens and non-pathogens. A reduced subset of highly informative genes () is presented and applied to an external validation set. The statistical model was implemented in the BacFier v1.0 software (freely available at ), that displays not only the prediction (pathogen/non-pathogen) and an associated probability for pathogenicity, but also the presence/absence vector for the analyzed genes, so it is possible to decipher the subset of virulence genes responsible for the classification on the analyzed genome. Furthermore, we discuss the biological relevance for bacterial pathogenesis of the core set of genes, corresponding to eight functional categories, all with evident and documented association with the phenotypes of interest. Also, we analyze which functional categories of virulence genes were more distinctive for pathogenicity in each taxonomic group, which seems to be a completely new kind of information and could lead to important evolutionary conclusions.

Highlights

  • Several factors, including globalization and sanitation conditions, have been shaping the world’s landscape of infectious diseases over the years

  • All finished and annotated genomes of human pathogenic and non-pathogenic bacteria were used to perform a presence/absence analysis over 814 groups of orthologous genes belonging to 8 functional categories, in order to determine which ones are strongly related to pathogenicity in different bacterial taxonomic groups (Actinobacteria, Alphaproteobacteria, Betaproteobacteria, Bacteroidetes/Chlorobi, Chlamydiae/Verrucomicrobia, Deltaproteobacteria, Epsilonproteobacteria, Firmicutes, Gammaproteobacteria, Spirochaetes, etc.)

  • Genes presenting a high frequency among pathogens and a low frequency in nonpathogens are probably contributing to a pathogen-related phenotype, for example genes coding for toxins

Read more

Summary

Introduction

Several factors, including globalization and sanitation conditions, have been shaping the world’s landscape of infectious diseases over the years. 90 percent of documented infections in hospitalized patients are caused by bacteria. These cases probably show only a small proportion of the actual number of bacterial infections occurring in the entire population, and they usually represent the most severe cases. Many of which are of bacterial etiology, are the second leading cause of death in the world (after cardiovascular diseases), killing 2:5 million people annually (WHO, 2008). This scenario evidences that even today, infectious diseases are a permanent threat for human health around the world

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.