Abstract The Cancer Genome Atlas (TCGA) contains various types of genomic data from a wide variety of cancers, several of which affect the same tissue site. Here we analyze copy number and RNA-Seq data from TCGA for lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC). Level 1 (raw) data, rather than Level 3 (segmented) data, from TCGA was used for an integrated analysis of copy number and gene expression profiles of the two cancers. Using the copy number and SNP- probes, re-processed tumor profiles were more consistent with a control set in terms of median number of copy number events, sample ploidy, and breakpoint genes than with the published “level 3” TCGA data. Probe-level data were analyzed using Nexus Copy Number SNP-FASST2 (a multi-state HMM algorithm that uses both SNP and copy number probes in making state assignments), with systematic correction applied to correct for GC biases. Additionally we performed manual baseline adjustment to correct for sample ploidy based on whole-genome B-alelle frequency data for each sample. Overall, the median number of copy number events in the LUAD TCGA data set was reduced from 371 (in the level 3 set) to 299, and from 681 (in the level 3 set) to 177 in the LUSC TCGA data set. After manual inspection, more than 38% of the TCGA LUAD samples and 50% of the TCGA LUSC samples available at level 3 were found to have incorrect baseline ploidy assignments. The resultant re-analyzed copy number data sets were used for an integrated analysis between the two tumor types. Comprehensive comparative analysis using Fisher's Exact Test revealed statistically significant differences (percent differential = 25%, p<0.001) in copy number profiles between the two lung tumor types; copy number changes include differential loss of chromosome 1p, loss of chromosome 3p, gain of chromosome 3q, loss of chromosome 4 and loss of chromosome 5. These correlated to changes in overall survival: individuals with loss of chromosome 1p and/or loss of 5q resulted in a significantly poorer prognosis (p<0.05). While more frequent in the LUSC sample population, this change in overall survival outcome was extended to samples with chromosome 1p loss in LUAD samples as well. Integration with RNA-Seq expression data from each tumor type revealed statistically significant correlations (p<0.05) with these copy number alterations, identifying potential driver genes of interest among each subtype and lung tumors in general. Citation Format: Andrea J. OHara, Raja Keshavan, Zhiwei Che, Soheil Shams. An integrated comparative analysis of TCGA lung adenocarcioma and lung squamous cell carcinoma copy number and RNA-Seq expression data. [abstract]. In: Proceedings of the 106th Annual Meeting of the American Association for Cancer Research; 2015 Apr 18-22; Philadelphia, PA. Philadelphia (PA): AACR; Cancer Res 2015;75(15 Suppl):Abstract nr 2978. doi:10.1158/1538-7445.AM2015-2978
Read full abstract