Validating false discovery rate (FDR) estimation is an essential but surprisingly understudied aspect of method development in shotgun proteomics. Currently available validation protocols mostly rely on ground truth data sets, which typically involve manipulating the properties of the search space or query spectra used. As a result, comparing estimated FDR and ground truth-based false discovery proportion values may not be representative of the scenarios involving natural data sets encountered in practice. In this study, we introduce PyViscount─a Python tool implementing a novel validation protocol based on random search space partition, which enables generating a quasi ground-truth using unaltered search spaces of unique candidate peptides and generic data sets of experimental query spectra. Furthermore, validation of existing FDR estimation methods by PyViscount is consistent with alternative validation protocols. The presented novel approach to validation free from the need for synthetic data sets or dubious manipulation of the data may be an attractive alternative for proteomics practitioners, allowing them to obtain deeper insights into the performance of existing and new FDR estimation methods.
Read full abstract