Abstract We present an algorithm, “Ensemble Logical Analysis of Survival Data”, for predicting survival of cancer patients based on patterns identified on gene-expression microarray data. A pattern is defined as a combination of expression levels of multiple genes, associated with risk of mortality or metastasis of the tumor. We illustrate our method on several breast cancer microarray datasets, including 286 samples from a study of node negative untreated samples from Wang et al 2005, 347 primary invasive samples from Ivshina et al. 2006, and 255 early-stage samples who received Tamoxifen as adjuvant treatment from Loi et al. 2008. We use the recent molecular stratification of the breast cancers into eight robust subtypes: Luminals: (LA, LB1, LB2, LB3), Basal-like: (BA1, BA2) and Her2+: (Her2+I, Her2+NI) from Alexe et al. 2007 and identify patterns of genes which drive the disease within each of these molecular types. This is followed by an analysis of the genes seen most frequently in patterns to determine the pathways responsible for progression/metastasis/death for each subtype. At each event time point (event = death/recurrence/metastasis), we identify patterns that separate patients into high/low risk classes. Each pattern is then associated with a score defined as the integral of the Kaplan-Meier survival curve for the samples that satisfy the pattern. Pattern space is not disjoint and a sample can satisfy more than one pattern, even for a given time point. The average of the pattern scores for each patient across all time points defines a patient-specific survival score. All the above steps are computed using an ensemble method: two-thirds of the data is randomly selected, patterns are generated on this random subset, and patient scores are computed. This is repeated 100 times, and patient scores are aggregated over these 100 runs. We find that these aggregated scores correlate very well with actual survival in the Wang dataset (c-statistic: 86% for LA, 95% for LB1, 88% for LB3, 77% for Basals, and 80% for Her2+). High/low risk class assignment based on the median of the predicted survival score is highly significant (log-rank p-value: 0.01 for LA, 0.0004 for LB1, 0.00006 for LB3, 3.3×10−8 for Basals and 0.0003 for Her2+). The method that we propose is general and can be used for predicting survival for any cancer type. It can also be applied to RT-PCR data, single nucleotide polymorphisms, microRNA data and data on copy number variations. Finally, we show that our risk score is significantly superior to an Oncotype DXTM score inferred for the samples from the gene expression data. Citation Information: Cancer Res 2009;69(23 Suppl):C37.