Abstract

This computational protocol describes how to use pyPGCF, a python software package that runs in the linux environment, in order to analyze bacterial genomes and perform: (i) phylogenomic analysis, (ii) species demarcation, (iii) identification of the core proteins of a bacterial genus and its individual species, (iv) identification of species-specific fingerprint proteins that are found in all strains of a species and, at the same time, are absent from all other species of the genus, (v) functional annotation of the core and fingerprint proteins with eggNOG, and (vi) identification of secondary metabolite biosynthetic gene clusters (smBGCs) with antiSMASH. This software has already been implemented to analyze bacterial genera and species that are important for plants (e.g., Pseudomonas, Bacillus, Streptomyces). In addition, we provide a test dataset and example commands showing how to analyze 165 genomes from 55 species of the genus Bacillus. The main advantages of pyPGCF are that: (i) it uses adjustable orthology cut-offs, (ii) it identifies species-specific fingerprints, and (iii) its computational cost scales linearly with the number of genomes being analyzed. Therefore, pyPGCF is able to deal with a very large number of bacterial genomes, in reasonable timescales, using widely available levels of computing power.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.