Abstract

Abstract The goal of this talk is to introduce an integrated quantitative proteogenomic approach to comprehensively map proteomic information back to their encoding genes. We seek evidence from mass spectrometry-based large-scale proteomic data of patient populations in conjunction with patient-centric next-generation sequencing data and unbiased sequencing strategies to study breast cancer (BC) subtypes from a genomic context. We have obtained global and phosphoproteomic data with matching next generation sequencing data for 18 patient-derived xenografts (PDXs) representing the major clinical subtypes of BC. Our workflow starts with the creation of several protein sequence databases that serve as a template for mass spectrometry database identifications. These databases include 1) completely annotated reference protein sequences, 2) patient-specific databases that were created using next generation sequencing data, 3) isoform databases that contain all possible splicing combinations, and 4) amino acid sequence database resulting from a six-frame translation of the entire human reference and customized genomes. All mass spectrometry raw data are searched against the databases for obtaining identifications at the peptide level, and assembly of peptides for quantification using taxonomy-based label-free quantitation (LFQ) that can specifically quantify unique human peptide sequences found in PDXs. The peptides are then mapped to the human genome and visualized using a genome browser. Quantitative changes across PDXs are presented at the protein level or at the isoform level via peptide role-up to specific exons and visualized as a quantitative data track. By combining search results from these databases we obtain a comprehensive view of our PDXs. The complementary nature of the databases enable greater proteomic depth, i.e. databases with complete splicing combinations capture proteomic evidence when patient-specific databases fail due to possible erroneous RNA-seq reads. Similarly, 6-frame translated amino acid databases can capture potentially novel coding regions but are unable to detect splicing. Peptide maps are obtained for individual genes or specific protein isoforms covering both knowledge-driven and novel genomic annotation types. We compile peptides carrying variants, splice junctions, fusions, and new coding regions specific to each PDX or in common with a specific BC subtype. Majority of data is mapped back to the genome loci using unmodified peptides via global proteomics while phosphopeptides that contain variants and splicing are also mapped in a similar manner using phosphoproteomic data. We have currently annotated 455 novel proteogenomic hits covering many examples outlined above for genes related to breast cancer and show how these can be specifically identified and in some instances differentially quantified in the PDX models. Citation Format: Harsha P. Gunawardena, John A. Wrobel, Jonathon O'Brien, Ling Xie, Petra Erdmann-Gilmore, Sherri R. Davies, Shunqiang Li, Song Cao, Michael McLellan, Kelly V. Ruggles, David Fenyo, R. Reid Townsend, Li Ding, Bahjat F. Qaqish, Matthew J. Ellis, Xian Chen. Proteogenomic characterization of breast cancer sub-types in patient derived xenografts. [abstract]. In: Proceedings of the 106th Annual Meeting of the American Association for Cancer Research; 2015 Apr 18-22; Philadelphia, PA. Philadelphia (PA): AACR; Cancer Res 2015;75(15 Suppl):Abstract nr 1999. doi:10.1158/1538-7445.AM2015-1999

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call