Development and Evaluation of Therapeutically Relevant Predictive Classifiers Using Gene Expression Profiling

Richard Simon

doi:10.1093/jnci/djj364

Abstract

Gene expression profi les offer the possibility of improving risk predication and optimizing treatment selection for individual patients. Two articles in this issue of the Journal describe clinical studies of gene expression profi ling — one a developmental study and the other a validation study. Asgharzadeh et al. ( 1 ) address the development of a prognostic classifi er for patients with metastatic neuroblastoma lacking amplifi cation of the MYCN gene. Buyse et al. ( 2 ) report the validation of a gene expression – based prognostic classifi er for patients with early breast cancer. Asgharzadeh et al. ( 1 ) developed their classifi er based on the expression of 55 genes that appears to predict risk of disease progression more accurately than does patient age, histologic type, or other currently used risk features. The claims of improved risk prediction are based on an internal estimate of prediction accuracy computed by Asgharzadeh et al. The approach taken by the authors allowed them to avoid one of the major pitfalls of developmental studies, which is that they often provide highly biased estimates of accuracy. The fundamental principle is that the same data should not be used for developing a predictive classifi er and for evaluating the accuracy of that classifi er. This principle is especially important for microarray-based studies because the number of candidate predictors (genes) is generally orders of magnitude greater than the number of cases. In this setting, the bias of using the same data for developing and evaluating a classifi er is overwhelming ( 3 ) . Some developmental studies avoid this bias by separating the data into a training set used for model development and a test set used for evaluating the predictive accuracy of the model. Although this split-sample approach is useful, it represents an ineffi cient use of the data in that the training set may be too small to develop an accurate classifi cation model and the test set may be too small to provide an accurate estimate of prediction accuracy. Molinaro et al. ( 4 ) showed that various cross-validation approaches can provide better estimates of predictive accuracy. Such methods are based on repeatedly partitioning the sample into a relatively large portion that is used for classifi er development and a small portion that is used for classifi er evaluation and then averaging the results over the multiple partitions. In this case, the estimate of prediction accuracy pertains to the model that was developed using the full dataset, which is the model that will be used in future studies. The cross-validation procedure develops multiple classifi ers based on reduced training sets only as steps in computing an estimate of prediction accuracy for the classifi er developed using the full dataset. In developing a classifi er on a reduced data training set, the model development algorithm must be applied from scratch, without using any information based on data not part of that reduced data training set. This point is frequently overlooked by biomedical scientists, statisticians, and computer scientists. Asgharzadeh et al. were careful to use cross-validation methods

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Development and Evaluation of Therapeutically Relevant Predictive Classifiers Using Gene Expression Profiling

Abstract

Talk to us

Similar Papers

More From: JNCI: Journal of the National Cancer Institute

Lead the way for us

Journal: JNCI: Journal of the National Cancer Institute	Publication Date: Sep 6, 2006
Citations: 75

Similar Papers

Editor's evaluation: Multimodal brain age estimates relate to Alzheimer disease biomarkers and cognition in early stages: a cross-sectional observational study
Karla L Miller
-
Karla L MillerKarla L Miller
20 Oct 2022
20 Oct 2022

Decision letter: Multimodal brain age estimates relate to Alzheimer disease biomarkers and cognition in early stages: a cross-sectional observational study
Didac Vidal-Pineiro ... Jeannie Chin
-
Didac Vidal-Pineiro, et. al.Didac Vidal-Pineiro ... Jeannie Chin
20 Oct 2022
20 Oct 2022

Author response: Multimodal brain age estimates relate to Alzheimer disease biomarkers and cognition in early stages: a cross-sectional observational study
Peter R Millar ... Georg Nübling
-
Peter R Millar, et. al.Peter R Millar ... Georg Nübling
28 Nov 2022
28 Nov 2022

Prediction of genetic contributions to complex traits using whole genome sequencing data.
Chen Yao ... Kent A Weigel
BMC Proceedings | VOL. 8
Chen Yao, et. al.Chen Yao ... Kent A Weigel
01 Jun 2014
BMC Proceedings | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Development and Evaluation of Therapeutically Relevant Predictive Classifiers Using Gene Expression Profiling

Abstract

Talk to us

Similar Papers

More From: JNCI: Journal of the National Cancer Institute