BackgroundWe have identified a set of genes whose relative mRNA expression levels in various solid tumors can be used to robustly distinguish cancer from matching normal tissue. Our current feature set consists of 113 gene probes for 104 unique genes, originally identified as differentially expressed in solid primary tumors in microarray data on Affymetrix HG-U133A platform in five tissue types: breast, colon, lung, prostate and ovary. For each dataset, we first identified a set of genes significantly differentially expressed in tumor vs. normal tissue at p-value = 0.05 using an experimentally derived error model. Our common cancer gene panel is the intersection of these sets of significantly dysregulated genes and can distinguish tumors from normal tissue on all these five tissue types.MethodsFrozen tumor specimens were obtained from two commercial vendors Clinomics (Pittsfield, MA) and Asterand (Detroit, MI). Biotinylated targets were prepared using published methods (Affymetrix, CA) and hybridized to Affymetrix U133A GeneChips (Affymetrix, CA). Expression values for each gene were calculated using Affymetrix GeneChip analysis software MAS 5.0. We then used a software package called Genes@Work for differential expression discovery, and SVM light linear kernel for building classification models.ResultsWe validate the predictability of this gene list on several publicly available data sets generated on the same platform. Of note, when analysing the lung cancer data set of Spira et al, using an SVM linear kernel classifier, our gene panel had 94.7% leave-one-out accuracy compared to 87.8% using the gene panel in the original paper. In addition, we performed high-throughput validation on the Dana Farber Cancer Institute GCOD database and several GEO datasets.ConclusionsOur result showed the potential for this panel as a robust classification tool for multiple tumor types on the Affymetrix platform, as well as other whole genome arrays. Apart from possible use in diagnosis of early tumorigenesis, some other potential uses of our methodology and gene panel would be in assisting pathologists in diagnosis of pre-cancerous lesions, determining tumor boundaries, assessing levels of contamination in cell populations in vitro and identifying transformations in cell cultures after multiple passages. Moreover, based on the robustness of this gene panel in identifying normal vs. tumor, mislabelled or misinterpreted samples can be pinpointed with high confidence.
Read full abstract