Abstract

Small proteins play essential roles in bacterial physiology and virulence, however, automated algorithms for genome annotation are often not yet able to accurately predict the corresponding genes. The accuracy and reliability of genome annotations, particularly for small open reading frames (sORFs), can be significantly improved by integrating protein evidence from experimental approaches. Here we present a highly optimized and flexible bioinformatics workflow for bacterial proteogenomics covering all steps from (i) generation of protein databases, (ii) database searches and (iii) peptide-to-genome mapping to (iv) visualization of results. We used the workflow to identify high quality peptide spectrum matches (PSMs) for small proteins (≤ 100 aa, SP100) in Staphylococcus aureus Newman. Protein extracts from S. aureus were subjected to different experimental workflows for protein digestion and prefractionation and measured with highly sensitive mass spectrometers. In total, 175 proteins with up to 100 aa (SP100) were identified. Out of these 24 (ranging from 9 to 99 aa) were novel and not contained in the used genome annotation.144 SP100 are highly conserved and were found in at least 50% of the publicly available S. aureus genomes, while 127 are additionally conserved in other staphylococci. Almost half of the identified SP100 were basic, suggesting a role in binding to more acidic molecules such as nucleic acids or phospholipids.

Highlights

  • Staphylococcus aureus is a Gram-positive human pathogen of great clinical importance

  • High Quality mass spectrometry (MS)/ MS spectra of the identified tryptic peptides unique for SP100 are accessible in Supplemental Materials

  • A single nucleotide permutation test was used to verify the global confidence and significance of ORFs based on their length only

Read more

Summary

Introduction

Staphylococcus aureus is a Gram-positive human pathogen of great clinical importance. S. aureus causes mainly nosocomial infections in immunocompromized patients, which are frequently associated with difficult to treat multidrug-resistant S. aureus phenotypes [1]. With 11,809 genome sequences (including 576 complete genomes), which are publicly available in the reference sequence database of the National Center of Biotechnology Information (RefSeq; status 2020-08-19), S. aureus is among the most frequently sequenced bacteria. A preliminary S. aureus pan-genome based on the comparison of 64 S. aureus genome sequences is composed of 7,411 genes, of which about 20% are conserved constituting the core-genome [3]. The highest variability has been found among genes coding for extracellular and surface-associated proteins [4] which is of particular importance as these proteins are essentially involved in direct interactions with the host environment during infection

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call