Abstract

As the next-generation sequencing technology becomes broadly applied, genomics and transcriptomics are becoming more commonly used in both research and clinical settings. However, proteomics is still an obstacle to be conquered. For most peptide search programs in proteomics, a standard reference protein database is used. Because of the thousands of coding DNA variants in each individual, a standard reference database does not provide perfect match for many proteins/peptides of an individual. A personalized reference database can improve the detection power and accuracy for individual proteomics data. To connect genomics and proteomics, we designed a Python package PrecisionProDB that is specialized for generating a personized protein database for proteomics applications. PrecisionProDB supports multiple popular file formats and reference databases, and can generate a personized database in minutes. To demonstrate the application of PrecisionProDB, we generated human population-specific reference protein databases with PrecisionProDB, which improves the number of identified peptides by 0.34% on average. In addition, by incorporating cell line-specific variants into the protein database, we demonstrated a 0.71% improvement for peptide identification in the Jurkat cell line. With PrecisionProDB and these datasets, researchers and clinicians can improve their peptide search performance by adopting the more representative protein database or adding population and individual-specific proteins to the search database with minimum increase of efforts. PrecisionProDB and pre-calculated protein databases are freely available at https://github.com/ATPs/PrecisionProDB and https://github.com/ATPs/PrecisionProDB_references. Supplementary data are available at Bioinformatics online.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call