Abstract 2708: Imputation-free analysis of high throughput TMT proteomics of 116 lung squamous samples

Eric A. Welsh,John M. Koomen,Bin Fang,Eric B. Haura,Guolin Zhang,Matthew C. Chambers,Steven A. Eschrich,Paul A. Stewart

doi:10.1158/1538-7445.am2018-2708

Abstract

Abstract Introduction: Chemical labeling of peptides using tandem mass tags (TMT) is a “barcoding” strategy, enabling relative protein quantification across a single panel of samples (as opposed to each run separately). Each multiplex assay is, effectively, its own "batch" of samples, and thus direct comparison of intensities between TMT multiplexes is problematic. Additionally, although there is relatively little missing data within a single plex, there can be large differences in missingness across plexes, with the two types of missingness exhibiting different behavior (infrequent and biased towards low abundances within-plex; more frequent and more stochastic between-plex). We have addressed these issues by developing new pipelines for data normalization, protein-level rollup, and downstream clustering, which seek to minimize the negative impact of missingness. This method development was driven by, and applied to, a set of 116 human lung squamous (SQLC) tumors, with the aim of improving the strength of down-stream biological signal and interpretation. Experiment: TMT analysis was performed on 116 SQLC samples. Each 6-plex contained 4 tumors and 2 pool replicates. The shared pool of 116 tumors was assayed on every multiplex to allow for controlling for variability between plexes, with one pool in ch-126 and the other varying channel between plexes. IDPicker was used for spectral quantification. Spectra abundances were normalized within-plex, and ratios calculated for each channel against the ch-126 pool. Spectra-level ratios were rolled up into protein-level ratios using the geometric mean of ratios within each protein group. Geometric mean protein-level abundance rollup was performed on abundances for each ch-126 pool, normalized across pools, and the geometric mean calculated for each protein group across pools. These mean protein-level abundances were then used to scale the ratios back into final normalized abundances. Average linkage hierarchical clustering was performed on abundance z-scores using a novel distance metric, calculated as the root mean squared deviation (RMSD) of points present in both vectors, divided by a binary presence/absence similarity coefficient such as Ochiai similarity. Results: After normalization, principal component analysis showed no batch effect due to differences between plexes. Heat maps generated using the novel distance metric exhibited improved biological signal over RMSD alone. Tumors cluster into 3 major groupings: high immune + low transcriptional/translational activity, low immune + high transcriptional/translational activity, and samples with medium levels of both. Conclusion: Missingness-aware methods of shared-pool TMT normalization and clustering minimize the negative impact of missingness and yield strong biological signal. Preliminary results suggest that immune response is a major source of differences between lung squamous tumors. Citation Format: Eric A. Welsh, Paul A. Stewart, Matthew C. Chambers, Guolin Zhang, Bin Fang, Steven A. Eschrich, John M. Koomen, Eric B. Haura. Imputation-free analysis of high throughput TMT proteomics of 116 lung squamous samples [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2018; 2018 Apr 14-18; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2018;78(13 Suppl):Abstract nr 2708.

Full Text