Selection of proteomic workflows for a given project can be a daunting task. This research provides a guide outlining the impact on protein identification of different steps such as chromatographic separation, data acquisition strategies, and bioinformatic pipelines. The data presented here will help experts and nonexpert proteomic users to increase proteome coverage and peptide identification. HeLa protein digests were analyzed through different C18 chromatographic columns (15 and 50 cm in length), using top 12 data-dependent acquisition (DDA), top 20 DDA, and data-independent acquisition (DIA) with a nanospray source in positive mode in a Thermo Q Exactive instrument. The raw data were analyzed using different search engines, rescoring approaches, and multi-engine searches. The results were analyzed in the context of peptide and protein identifications, precursor properties, and computation requirements to understand the differences between methods. Our results showed that higher column lengths and top N DDA approaches were able to significantly increase protein identifications. The use of multiple search engines yielded limited gains, whereas the use of rescoring methods clearly outperformed other strategies. Finally, DIA approaches, although successful at generating new identifications, had a limited performance influenced by the previous collection of DDA data, which could prohibitively increase instrument time. Nonetheless, the use of library-free methods showed promising results. Our results highlight the impact of different experimental approaches on proteome coverage. Changes in chromatographic columns, data acquisition, or bioinformatic analysis can significantly increase the number of protein identifications (>400%). Thus, this research provides a reference upon which to build a successful proteomic workflow with different considerations at every step.
Read full abstract