A Bioinformatics Workflow for Variant Peptide Detection in Shotgun Proteomics

Jing Li,Patrick Halvey,Daniel C Liebler,Robbert J.C Slebos,William Pao,Ze-Qiang Ma,Zengliu Su,David L Tabb,Bing Zhang

doi:10.1074/mcp.m110.006536

Jing Li, Patrick Halvey + Show 7 more

Open Access

PDF Available

https://doi.org/10.1074/mcp.m110.006536

Copy DOI

Export

Save

Cite

Journal: Molecular & Cellular Proteomics	Publication Date: Mar 9, 2011
Citations: 99	License type: cc-by

Affiliation: Vanderbilt University

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Shotgun proteomics data analysis usually relies on database search. However, commonly used protein sequence databases do not contain information on protein variants and thus prevent variant peptides and proteins from been identified. Including known coding variations into protein sequence databases could help alleviate this problem. Based on our recently published human Cancer Proteome Variation Database, we have created a protein sequence database that comprehensively annotates thousands of cancer-related coding variants collected in the Cancer Proteome Variation Database as well as noncancer-specific ones from the Single Nucleotide Polymorphism Database (dbSNP). Using this database, we then developed a data analysis workflow for variant peptide identification in shotgun proteomics. The high risk of false positive variant identifications was addressed by a modified false discovery rate estimation method. Analysis of colorectal cancer cell lines SW480, RKO, and HCT-116 revealed a total of 81 peptides that contain either noncancer-specific or cancer-related variations. Twenty-three out of 26 variants randomly selected from the 81 were confirmed by genomic sequencing. We further applied the workflow on data sets from three individual colorectal tumor specimens. A total of 204 distinct variant peptides were detected, and five carried known cancer-related mutations. Each individual showed a specific pattern of cancer-related mutations, suggesting potential use of this type of information for personalized medicine. Compatibility of the workflow has been tested with four popular database search engines including Sequest, Mascot, X!Tandem, and MyriMatch. In summary, we have developed a workflow that effectively uses existing genomic data to enable variant peptide detection in proteomics.

Highlights

DNA sequence variation is associated with diseases and differential drug response
A SNP annotation method was presented by Bunger et al in which mass spectrometry (MS)/MS spectra were searched against reference protein databases and a separate SNP database created from peptides from the National Center for Biotechnology Information (NCBI) dbSNP database [14]
Setup of the Workflow—As illustrated in Fig. 1, our workflow for identifying wild-type and variant peptides based on shotgun proteomics data includes three steps: database creation, peptide identification, and post-processing

Summary

EXPERIMENTAL PROCEDURES

The cell lines were obtained from American Type Culture Collection (ATCC, Manassas, VA) and grown and harvested within 6 months of date of purchase, or grown from frozen stocks that had been made within 6 months of original purchase. They were grown in 10% fetal bovine serum and penicillin and streptomycin supplemented medium at 37 °C with 5% CO2. The resulting peptides were separated on isoelectric focusing strips that were cut into 15 (for cell lines) or 20 (for human tissues) separate fractions Each of these fractions was analyzed by a second separation on a liquid chromatography column, followed by MS/MS analysis on an LTQ-Orbitrap. All sequence chromatograms were read in both forward (F) and reverse (R) directions

RESULTS

11 EILDEAYAMAGVGSPYVSR

DISCUSSION