Abstract

Concatenated target-decoy database searches are commonly used in proteogenomic research for variant peptide identification. Currently, protein-based and peptide-based sequence databases are applied to store variant sequences for database searches. The protein-based database records a full-length wild-type protein sequence but using the given variant events to replace the original amino acids, whereas the peptide-based database retains only the in silico digested peptides containing the variants. However, the performance of applying various decoy generation methods on the peptide-based variant sequence database is still unclear, compared to the protein-based database. In this paper, we conduct a thorough comparison on target-decoy databases constructed by the above two types of databases coupled with various decoy generation methods for proteogenomic analyses. The results show that for the protein-based variant sequence database, using the reverse or the pseudo reverse method achieves similar performance for variant peptide identification. Furthermore, for the peptide-based database, the pseudo reverse method is more suitable than the widely used reverse method, as shown by identifying 6% more variant PSMs in a HEK293 cell line data set. SignificanceIn our survey of publications on proteogenomic studies, 57% of the studies adopt the peptide-based variant sequence database coupled with the reverse method for decoy generation to construct a target-decoy database for searches. However, our results show that when using the peptide-based variant sequence database, it is better to adopt the pseudo reverse method for generating decoy sequences, to avoid leading to fewer variant peptides being identified.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call