The majority of studies focusing on microbial functioning in various environments are based on DNA or RNA sequencing techniques that have inherent limitations and usually provide a distorted picture about the functional status of the studied system. Untargeted proteomics is better suited for that purpose, but it suffers from low efficiency when applied in complex consortia. In practice, the scanning capabilities of the currently employed LC-MS/MS systems provide limited coverage of key-acting proteins, hardly allowing a semiquantitative assessment of the most abundant ones from most prevalent species. When particular biological processes of high importance are under investigation, the analysis of specific proteins using targeted proteomics is a more appropriate strategy as it offers superior sensitivity and comes with the added benefits of increased throughput, dynamic range and selectivity. However, the development of targeted assays requires a priori knowledge regarding the optimal peptides to be screened for each protein of interest. In complex, multi-species systems, a specific biochemical process may be driven by a large number of homologous proteins having considerable differences in their amino acid sequence, complicating LC-MS/MS detection. To overcome the complexity of such systems, we have developed an automated pipeline that interrogates UniProt database or user-created protein datasets (e.g. from metagenomic studies) to gather homolog proteins with a defined functional role and extract respective peptide sequences, while it computes several protein/peptide properties and relevant statistics to deduce a small list of the most representative, process-specific and LC-MS/MS-amenable peptides for the microbial enzymatic activity of interest.
Read full abstract