Abstract

Shotgun proteomics data analysis usually relies on database search. However, commonly used protein sequence databases do not contain information on protein variants and thus prevent variant peptides and proteins from been identified. Including known coding variations into protein sequence databases could help alleviate this problem. Based on our recently published human Cancer Proteome Variation Database, we have created a protein sequence database that comprehensively annotates thousands of cancer-related coding variants collected in the Cancer Proteome Variation Database as well as noncancer-specific ones from the Single Nucleotide Polymorphism Database (dbSNP). Using this database, we then developed a data analysis workflow for variant peptide identification in shotgun proteomics. The high risk of false positive variant identifications was addressed by a modified false discovery rate estimation method. Analysis of colorectal cancer cell lines SW480, RKO, and HCT-116 revealed a total of 81 peptides that contain either noncancer-specific or cancer-related variations. Twenty-three out of 26 variants randomly selected from the 81 were confirmed by genomic sequencing. We further applied the workflow on data sets from three individual colorectal tumor specimens. A total of 204 distinct variant peptides were detected, and five carried known cancer-related mutations. Each individual showed a specific pattern of cancer-related mutations, suggesting potential use of this type of information for personalized medicine. Compatibility of the workflow has been tested with four popular database search engines including Sequest, Mascot, X!Tandem, and MyriMatch. In summary, we have developed a workflow that effectively uses existing genomic data to enable variant peptide detection in proteomics.

Highlights

  • DNA sequence variation is associated with diseases and differential drug response

  • A SNP annotation method was presented by Bunger et al in which mass spectrometry (MS)/MS spectra were searched against reference protein databases and a separate SNP database created from peptides from the National Center for Biotechnology Information (NCBI) dbSNP database [14]

  • Setup of the Workflow—As illustrated in Fig. 1, our workflow for identifying wild-type and variant peptides based on shotgun proteomics data includes three steps: database creation, peptide identification, and post-processing

Read more

Summary

EXPERIMENTAL PROCEDURES

The cell lines were obtained from American Type Culture Collection (ATCC, Manassas, VA) and grown and harvested within 6 months of date of purchase, or grown from frozen stocks that had been made within 6 months of original purchase. They were grown in 10% fetal bovine serum and penicillin and streptomycin supplemented medium at 37 °C with 5% CO2. The resulting peptides were separated on isoelectric focusing strips that were cut into 15 (for cell lines) or 20 (for human tissues) separate fractions Each of these fractions was analyzed by a second separation on a liquid chromatography column, followed by MS/MS analysis on an LTQ-Orbitrap. All sequence chromatograms were read in both forward (F) and reverse (R) directions

RESULTS
11 EILDEAYAMAGVGSPYVSR
DISCUSSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call