10628 Background: Gene expression analysis is performed on grossly selected specimens often without any microscopic analysis of tumor content. In studies where histological analyses have been performed, cases having 80% or more tumor content are used for microarray analysis. The variability in amount of epithelial and stromal cells may generate to misleading differential expression analysis and selection for wrong targets for therapeutics. It is also often unclear, whether the genes identified are stromal or epithelial in origin. The goal of this study was to identify genes that define core epithelial phenotype; these genes could provide means of normalization of expression data. Methods: The CABIG GSK microarray (HG-U133_plus_2) data consisting of 950 cell lines from carcinoma (n=562), non-carcinoma (n=385) and normal tissue (n=3) was analyzed to identify epithelial specific genes. 10 carcinomas each from 11 sites (n=110) and an equal number of non-carcinomas were randomly selected. In silico analyses were performed by 1) identifying genes differentially expressed between carcinoma and non-carcinoma samples using a one way ANOVA; 2) identifying gene signature associated with carcinoma using Predictive Analysis of Microarrays (PAM) and 3) a weighted gene coexpression network analysis (WGCNA) was performed to identify co-expression modules. A similar analysis was also performed on tissue samples (E-GEOD-12360) from carcinomas and non-carcinomas. Venn-diagram was generated to identify intersecting set. Results: Comparison of the carcinoma and non-carcinoma samples using ANOVA identified 1455 differential expressed gene probes in cell lines and 540 gene probes in tissues (FDR=1E-10). The cell lines analysis identified 5 modules and a 65-gene signature (43 core and 22 accessory set) that was specific for epithelial cells. In the tissue analysis a 188-gene signature was similarly identified. Cross-comparison identified a smaller 31 gene intersecting set; this was not associated with loss of discriminatory power. Conclusions: A 31 geneset which can be used to determine the epithelial content of heterogeneous tumors, was identified. This study has the potential to significantly impact the use of microarray based gene expression data.
Read full abstract