Determination of Minimum Training Sample Size for Microarray-Based Cancer Outcome Prediction–An Empirical Assessment

Li Shao,Xiaohui Fan,Ningtao Cheng,Yiyu Cheng,Leihong Wu

doi:10.1371/journal.pone.0068579

Abstract

The promise of microarray technology in providing prediction classifiers for cancer outcome estimation has been confirmed by a number of demonstrable successes. However, the reliability of prediction results relies heavily on the accuracy of statistical parameters involved in classifiers. It cannot be reliably estimated with only a small number of training samples. Therefore, it is of vital importance to determine the minimum number of training samples and to ensure the clinical value of microarrays in cancer outcome prediction. We evaluated the impact of training sample size on model performance extensively based on 3 large-scale cancer microarray datasets provided by the second phase of MicroArray Quality Control project (MAQC-II). An SSNR-based (scale of signal-to-noise ratio) protocol was proposed in this study for minimum training sample size determination. External validation results based on another 3 cancer datasets confirmed that the SSNR-based approach could not only determine the minimum number of training samples efficiently, but also provide a valuable strategy for estimating the underlying performance of classifiers in advance. Once translated into clinical routine applications, the SSNR-based protocol would provide great convenience in microarray-based cancer outcome prediction in improving classifier reliability.

Highlights

Recent advances in gene expression microarray technology have opened up new opportunities for better treatment of diverse diseases [1,2,3]
The approbation of MammaPrintTM by U.S Food and Drug Administration (FDA) for clinical breast cancer prognosis [5] illustrated the promise of microarray technology in facilitating medical treatment in the future
The required minimum number of training samples varies with the complexity of different endpoints

Summary

Introduction

Recent advances in gene expression microarray technology have opened up new opportunities for better treatment of diverse diseases [1,2,3]. It helped with treatment selection to prolong survival time and improve life quality of cancer patients. The reliability of prediction results relied heavily on the accuracy of statistical parameters involved in microarray classifiers, which cannot be reliably estimated from a small number of training samples. It would help by collecting as many clinical samples as possible. Considering the fact that relatively rare clinical tissue samples can be used for transcriptional profiling, it is a challenge to estimate an appropriate number of training samples enough to achieve significant statistical power

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLoS ONE	Publication Date: Jul 5, 2013
Citations: 13	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Determination of Minimum Training Sample Size for Microarray-Based Cancer Outcome Prediction–An Empirical Assessment

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE

Lead the way for us

Similar Papers

Effects of transfer learning for handwritten digit classification in a small training sample size situation
Yoshihiro Mitani ... Yusuke Fujita
-
Yoshihiro Mitani, et. al.Yoshihiro Mitani ... Yusuke Fujita
17 Dec 2022
17 Dec 2022

Minimum training sample size requirements for achieving high prediction accuracy with the BN model: A case study regarding seismic liquefaction
Jilei Hu ... Luou Pang
Expert Systems with Applications | VOL. 185
Jilei Hu, et. al.Jilei Hu ... Luou Pang
09 Aug 2021
Expert Systems with Applications | VOL. 185

Effects of Training Parameter Concept and Sample Size in Possibilistic c-Means Classifier for Pigeon Pea Specific Crop Mapping
Priyadarsini Sivaraj ... Parth Naik
Geomatics | VOL. 2
Priyadarsini Sivaraj, et. al.Priyadarsini Sivaraj ... Parth Naik
22 Feb 2022
Geomatics | VOL. 2

A multi-view ensemble model based on semi-supervised feature learning for small sample classification of PolSAR images
Mohsen Darvishnezhad
International Journal of Remote Sensing | VOL. 45
Mohsen DarvishnezhadMohsen Darvishnezhad
28 Jan 2024
International Journal of Remote Sensing | VOL. 45

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Determination of Minimum Training Sample Size for Microarray-Based Cancer Outcome Prediction–An Empirical Assessment

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE