Abstract

BackgroundMicroarray technology is increasingly used to identify potential biomarkers for cancer prognostics and diagnostics. Previously, we have developed the iterative Bayesian Model Averaging (BMA) algorithm for use in classification. Here, we extend the iterative BMA algorithm for application to survival analysis on high-dimensional microarray data. The main goal in applying survival analysis to microarray data is to determine a highly predictive model of patients' time to event (such as death, relapse, or metastasis) using a small number of selected genes. Our multivariate procedure combines the effectiveness of multiple contending models by calculating the weighted average of their posterior probability distributions. Our results demonstrate that our iterative BMA algorithm for survival analysis achieves high prediction accuracy while consistently selecting a small and cost-effective number of predictor genes.ResultsWe applied the iterative BMA algorithm to two cancer datasets: breast cancer and diffuse large B-cell lymphoma (DLBCL) data. On the breast cancer data, the algorithm selected a total of 15 predictor genes across 84 contending models from the training data. The maximum likelihood estimates of the selected genes and the posterior probabilities of the selected models from the training data were used to divide patients in the test (or validation) dataset into high- and low-risk categories. Using the genes and models determined from the training data, we assigned patients from the test data into highly distinct risk groups (as indicated by a p-value of 7.26e-05 from the log-rank test). Moreover, we achieved comparable results using only the 5 top selected genes with 100% posterior probabilities. On the DLBCL data, our iterative BMA procedure selected a total of 25 genes across 3 contending models from the training data. Once again, we assigned the patients in the validation set to significantly distinct risk groups (p-value = 0.00139).ConclusionThe strength of the iterative BMA algorithm for survival analysis lies in its ability to account for model uncertainty. The results from this study demonstrate that our procedure selects a small number of genes while eclipsing other methods in predictive performance, making it a highly accurate and cost-effective prognostic tool in the clinical setting.

Highlights

  • Microarray technology is increasingly used to identify potential biomarkers for cancer prognostics and diagnostics

  • Van de Vijver et al [49] acquired a test set of 295 patient samples with clinical data on which to validate the 70-gene predictive signature. Of these 295 patient samples, 61 samples overlapped with the 78 training samples from van't Veer et al Since different clinical data and survival information were made available from these two publications, we used these 61 overlapping samples as our training set and the remaining 234 samples as our test set, both of which are available on our supplemental website http://expression.washing ton.edu/publications/kayee/ibmasurv/

  • Breast Cancer Data We applied iterativeBMAsurv to the breast cancer dataset of van't Veer et al [27] using parameters p = 1000, nbest = 50, maxNvar = 15 and cutPoint = 60, and the algorithm selected a total of 15 genes across 84 contending models

Read more

Summary

Introduction

Microarray technology is increasingly used to identify potential biomarkers for cancer prognostics and diagnostics. Our results demonstrate that our iterative BMA algorithm for survival analysis achieves high prediction accuracy while consistently selecting a small and cost-effective number of predictor genes. Malignant tumors were generally resected in operable cases, and follow-up radiation therapy was provided to victims exhibiting advanced-stage diseases This methodology proved problematic in that a number of low-risk patients experienced cancer recurrence or death within a short time frame, while a contingent of high-risk patients went into permanent remission despite the bleak nature of their original prognoses. This indicated a need to explore other indicators by which doctors could understand the underlying prognosis of a given disease and decide on a treatment plan that would optimize the patient's chances for survival. Reducing the number of predictor genes both decreases clinical costs and mitigates the possibility of overfitting due to high inter-variable correlations [1]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call