Abstract
BackgroundAnnotation of protein-coding genes is a key step in sequencing projects. Protein functions are mainly assigned on the basis of the amino acid sequence alone by searching of homologous proteins. However, fully automated annotation processes often lead to wrong prediction of protein functions, and therefore time-intensive manual curation is often essential. Here we describe a fast and reliable way to correct function annotation in sequencing projects, focusing on surface proteomes. We use a proteomics approach, previously proven to be very powerful for identifying new vaccine candidates against Gram-positive pathogens. It consists of shaving the surface of intact cells with two proteases, the specific cleavage-site trypsin and the unspecific proteinase K, followed by LC/MS/MS analysis of the resulting peptides. The identified proteins are contrasted by computational analysis and their sequences are inspected to correct possible errors in function prediction.ResultsWhen applied to the zoonotic pathogen Streptococcus suis, of which two strains have been recently sequenced and annotated, we identified a set of surface proteins without cytoplasmic contamination: all the proteins identified had exporting or retention signals towards the outside and/or the cell surface, and viability of protease-treated cells was not affected. The combination of both experimental evidences and computational methods allowed us to determine that two of these proteins are putative extracellular new adhesins that had been previously attributed a wrong cytoplasmic function. One of them is a putative component of the pilus of this bacterium.ConclusionWe illustrate the complementary nature of laboratory-based and computational methods to examine in concert the localization of a set of proteins in the cell, and demonstrate the utility of this proteomics-based strategy to experimentally correct function annotation errors in sequencing projects. This approach also contributes to provide strong experimental evidences that can be used to annotate those proteins for which a Gene Ontology (GO) term has not been assigned so far. Function annotation correction would then improve the identification of surface-associated proteins in bacterial pathogens, thus accelerating the discovery of new vaccines in infectious disease research.
Highlights
Annotation of protein-coding genes is a key step in sequencing projects
Genome sequencing projects are the major source of predicted proteins at the current time, and the function of gene products is generally assigned on the basis of the amino acid sequence alone by searching of homologous proteins in other organisms through similarity search engines such as BLAST [2,3]
Despite recent advances in computational ORFs prediction, a comprehensive annotation of protein-coding genes remains challenging, as fully automated annotation processes often lead to wrong prediction of protein functions [4], and timeintensive manual curation is often essential
Summary
Annotation of protein-coding genes is a key step in sequencing projects. Protein functions are mainly assigned on the basis of the amino acid sequence alone by searching of homologous proteins. Genome sequencing projects are the major source of predicted proteins at the current time, and the function of gene products is generally assigned on the basis of the amino acid sequence alone by searching of homologous proteins in other organisms through similarity search engines such as BLAST [2,3]. Given that protein function is strongly dependent on subcellular localization (SCL), SCL prediction algorithms can help by means of identifying sequence features such as signal peptides or transmembrane domains [10,11] These aspects are important when the aim is to select surface antigens for high-throughput vaccine development against pathogens [12]. Mass spectrometry-based proteomics is a powerful approach for validating gene annotation and predicting protein function, as it analyses proteins directly, verifying putative gene products at the level of translation [15,16]
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have