Integrated Bottom-Up and Top-Down Proteomics of Patient-Derived Breast Tumor Xenografts

David Fenyö,Ioanna Ntai,Sherri R. Davies,Ryan T. Fellers,Kelly V. Ruggles,Henry Rodriguez,Paul M. Thomas,Philip D. Compton,Emily S. Boja,Shunqiang Li,Neil L. Kelleher,R. Reid Townsend,Jeanne Rumsey,Richard D. LeDuc,Matthew J.C. Ellis,Petra Erdmann-Gilmore,Bryan P. Early

doi:10.1074/mcp.m114.047480

Abstract

Bottom-up proteomics relies on the use of proteases and is the method of choice for identifying thousands of protein groups in complex samples. Top-down proteomics has been shown to be robust for direct analysis of small proteins and offers a solution to the “peptide-to-protein” inference problem inherent with bottom-up approaches. Here, we describe the first large-scale integration of genomic, bottom-up and top-down proteomic data for the comparative analysis of patient-derived mouse xenograft models of basal and luminal B human breast cancer, WHIM2 and WHIM16, respectively. Using these well-characterized xenograft models established by the National Cancer Institute's Clinical Proteomic Tumor Analysis Consortium, we compared and contrasted the performance of bottom-up and top-down proteomics to detect cancer-specific aberrations at the peptide and proteoform levels and to measure differential expression of proteins and proteoforms. Bottom-up proteomic analysis of the tumor xenografts detected almost 10 times as many coding nucleotide polymorphisms and peptides resulting from novel splice junctions than top-down. For proteins in the range of 0–30 kDa, where quantitation was performed using both approaches, bottom-up proteomics quantified 3,519 protein groups from 49,185 peptides, while top-down proteomics quantified 982 proteoforms mapping to 358 proteins. Examples of both concordant and discordant quantitation were found in a ∼60:40 ratio, providing a unique opportunity for top-down to fill in missing information. The two techniques showed complementary performance, with bottom-up yielding eight times more identifications of 0–30 kDa proteins in xenograft proteomes, but failing to detect differences in certain posttranslational modifications (PTMs), such as phosphorylation pattern changes of alpha-endosulfine. This work illustrates the potency of a combined bottom-up and top-down proteomics approach to deepen our knowledge of cancer biology, especially when genomic data are available.

Highlights

While precise mapping of BU and TD data is complicated because they measure fundamentally different things, an early estimate of the proteoform-level dynamics not captured by BU can be made: For small, abundant proteins, changes in primary structure not captured by BU occur in about 40% of cases
Given this study, it is clear that there are significant benefits from the integration of BU and TD proteomics analyses, as a strong complementarity exists between peptide- and proteoform-level measurements
TD proved sensitive for detecting proteoform-level differences below 30 kDa, such as the multiple phosphorylation forms of alpha-endosulfine, relative expression of heterozygous alleles like in gamma-synuclein or ribosomal protein L35, and domain-specific regions of keratin

Summary

EXPERIMENTAL PROCEDURES

Sample Preparation—Cryopulverization of tumor xenografts was performed at Washington University in St. Study 2: Label-Free Top-Down Quantitation (Single Fraction up to 30 kDa)—An 8% GELFrEE cartridge was used to obtain a single fraction containing proteins of MW from 0 to 30 kDa. After SDS removal, proteins were resuspended in solvent A and injected onto the RP-4H LC setup described above. The GELFrEE was performed three times for each CompRef sample and the resulting protein fractions were analyzed in five LC/MS replicates, for a total of 150 RAW files. For each MS1-based mass group, neutral masses were determined from all 150 RAW files, and ProSightPC PUF files were created using a custom version of the cRAWler application These neutral mass data were searched as described above for Study 1. Representative fractionations for each study are illustrated in Supplemental Fig. S1. b The term proteins corresponds to protein groups as defined by Peak Studio, ver. 7. c the term proteins corresponds to a single RefSeq identifier. d Identification required a spectrum count of 3 within a single LC/MS run. e not performed

RESULTS

Protein description cSNP

DISCUSSION