Cancer Microarray Datasets Research Articles

Abstract We present an algorithm, “Ensemble Logical Analysis of Survival Data”, for predicting survival of cancer patients based on patterns identified on gene-expression microarray data. A pattern is defined as a combination of expression levels of multiple genes, associated with risk of mortality or metastasis of the tumor. We illustrate our method on several breast cancer microarray datasets, including 286 samples from a study of node negative untreated samples from Wang et al 2005, 347 primary invasive samples from Ivshina et al. 2006, and 255 early-stage samples who received Tamoxifen as adjuvant treatment from Loi et al. 2008. We use the recent molecular stratification of the breast cancers into eight robust subtypes: Luminals: (LA, LB1, LB2, LB3), Basal-like: (BA1, BA2) and Her2+: (Her2+I, Her2+NI) from Alexe et al. 2007 and identify patterns of genes which drive the disease within each of these molecular types. This is followed by an analysis of the genes seen most frequently in patterns to determine the pathways responsible for progression/metastasis/death for each subtype. At each event time point (event = death/recurrence/metastasis), we identify patterns that separate patients into high/low risk classes. Each pattern is then associated with a score defined as the integral of the Kaplan-Meier survival curve for the samples that satisfy the pattern. Pattern space is not disjoint and a sample can satisfy more than one pattern, even for a given time point. The average of the pattern scores for each patient across all time points defines a patient-specific survival score. All the above steps are computed using an ensemble method: two-thirds of the data is randomly selected, patterns are generated on this random subset, and patient scores are computed. This is repeated 100 times, and patient scores are aggregated over these 100 runs. We find that these aggregated scores correlate very well with actual survival in the Wang dataset (c-statistic: 86% for LA, 95% for LB1, 88% for LB3, 77% for Basals, and 80% for Her2+). High/low risk class assignment based on the median of the predicted survival score is highly significant (log-rank p-value: 0.01 for LA, 0.0004 for LB1, 0.00006 for LB3, 3.3×10−8 for Basals and 0.0003 for Her2+). The method that we propose is general and can be used for predicting survival for any cancer type. It can also be applied to RT-PCR data, single nucleotide polymorphisms, microRNA data and data on copy number variations. Finally, we show that our risk score is significantly superior to an Oncotype DXTM score inferred for the samples from the gene expression data. Citation Information: Cancer Res 2009;69(23 Suppl):C37.

BackgroundAccuracy in the diagnosis of breast cancer and classification of cancer subtypes has improved over the years with the development of well-established immunohistopathological criteria. More recently, diagnostic gene-sets at the mRNA expression level have been tested as better predictors of disease state. However, breast cancer is heterogeneous in nature; thus extraction of differentially expressed gene-sets that stably distinguish normal tissue from various pathologies poses challenges. Meta-analysis of high-throughput expression data using a collection of statistical methodologies leads to the identification of robust tumor gene expression signatures.MethodsA resampling-based meta-analysis strategy, which involves the use of resampling and application of distribution statistics in combination to assess the degree of significance in differential expression between sample classes, was developed. Two independent microarray datasets that contain normal breast, invasive ductal carcinoma (IDC), and invasive lobular carcinoma (ILC) samples were used for the meta-analysis. Expression of the genes, selected from the gene list for classification of normal breast samples and breast tumors encompassing both the ILC and IDC subtypes were tested on 10 independent primary IDC samples and matched non-tumor controls by real-time qRT-PCR. Other existing breast cancer microarray datasets were used in support of the resampling-based meta-analysis.ResultsThe two independent microarray studies were found to be comparable, although differing in their experimental methodologies (Pearson correlation coefficient, R = 0.9389 and R = 0.8465 for ductal and lobular samples, respectively). The resampling-based meta-analysis has led to the identification of a highly stable set of genes for classification of normal breast samples and breast tumors encompassing both the ILC and IDC subtypes. The expression results of the selected genes obtained through real-time qRT-PCR supported the meta-analysis results.ConclusionThe proposed meta-analysis approach has the ability to detect a set of differentially expressed genes with the least amount of within-group variability, thus providing highly stable gene lists for class prediction. Increased statistical power and stringent filtering criteria used in the present study also make identification of novel candidate genes possible and may provide further insight to improve our understanding of breast cancer development.

Cancer Microarray Datasets Research Articles

Related Topics

Articles published on Cancer Microarray Datasets

Abstract C37: Ensemble survival prediction for breast cancer using microarray data

CXCL12/SDF1 expression by breast cancers is an independent prognostic marker of disease-free and overall survival

A Bayesian approach to efficient differential allocation for resampling-based significance testing.

Demoting redundant features to improve the discriminatory ability in cancer data

Systematic identification of transcription factors associated with patient survival in cancers

An expression module of WIPF1-coexpressed genes identifies patients with favorable prognosis in three tumor types

PrognoScan: a new database for meta-analysis of the prognostic value of genes

Network-Based Inference of Cancer Progression from Microarray Data

Introducing Molecular Subtyping of Breast Cancer Into the Clinic?

Integrating Biological Knowledge with Gene Expression Profiles for Survival Prediction of Cancer

Identification of condition-specific regulatory modules through multi-level motif and mRNA expression analysis

A resampling-based meta-analysis for detection of differential gene expression in breast cancer

Statistical Significance of Clustering for High-Dimension, Low–Sample Size Data

Prognostic Breast Cancer Signature Identified from 3D Culture Model Accurately Predicts Clinical Outcome across Independent Datasets

The role of Nrf2 in increased reactive oxygen species and DNA damage in prostate tumorigenesis

Investigation of Self-Organizing Oscillator Networks for Use in Clustering Microarray Data

Cluster analysis of genome-wide expression data for feature extraction

Building pathway clusters from Random Forests classification using class votes

Comparing the Characteristics of Gene Expression Profiles Derived by Univariate and Multivariate Classification Methods

Classification and Diagnostic Output Prediction of Cancer Using Gene Expression Profiling and Supervised Machine Learning Algorithms

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Cancer Microarray Datasets Research Articles

Related Topics

Articles published on Cancer Microarray Datasets

Abstract C37: Ensemble survival prediction for breast cancer using microarray data

CXCL12/SDF1 expression by breast cancers is an independent prognostic marker of disease-free and overall survival

A Bayesian approach to efficient differential allocation for resampling-based significance testing.

Demoting redundant features to improve the discriminatory ability in cancer data

Systematic identification of transcription factors associated with patient survival in cancers

An expression module of WIPF1-coexpressed genes identifies patients with favorable prognosis in three tumor types

PrognoScan: a new database for meta-analysis of the prognostic value of genes

Network-Based Inference of Cancer Progression from Microarray Data

Introducing Molecular Subtyping of Breast Cancer Into the Clinic?

Integrating Biological Knowledge with Gene Expression Profiles for Survival Prediction of Cancer

Identification of condition-specific regulatory modules through multi-level motif and mRNA expression analysis

A resampling-based meta-analysis for detection of differential gene expression in breast cancer

Statistical Significance of Clustering for High-Dimension, Low–Sample Size Data

Prognostic Breast Cancer Signature Identified from 3D Culture Model Accurately Predicts Clinical Outcome across Independent Datasets

The role of Nrf2 in increased reactive oxygen species and DNA damage in prostate tumorigenesis

Investigation of Self-Organizing Oscillator Networks for Use in Clustering Microarray Data

Cluster analysis of genome-wide expression data for feature extraction

Building pathway clusters from Random Forests classification using class votes

Comparing the Characteristics of Gene Expression Profiles Derived by Univariate and Multivariate Classification Methods

Classification and Diagnostic Output Prediction of Cancer Using Gene Expression Profiling and Supervised Machine Learning Algorithms