Abstract

The promise of microarray technology in providing prediction classifiers for cancer outcome estimation has been confirmed by a number of demonstrable successes. However, the reliability of prediction results relies heavily on the accuracy of statistical parameters involved in classifiers. It cannot be reliably estimated with only a small number of training samples. Therefore, it is of vital importance to determine the minimum number of training samples and to ensure the clinical value of microarrays in cancer outcome prediction. We evaluated the impact of training sample size on model performance extensively based on 3 large-scale cancer microarray datasets provided by the second phase of MicroArray Quality Control project (MAQC-II). An SSNR-based (scale of signal-to-noise ratio) protocol was proposed in this study for minimum training sample size determination. External validation results based on another 3 cancer datasets confirmed that the SSNR-based approach could not only determine the minimum number of training samples efficiently, but also provide a valuable strategy for estimating the underlying performance of classifiers in advance. Once translated into clinical routine applications, the SSNR-based protocol would provide great convenience in microarray-based cancer outcome prediction in improving classifier reliability.

Highlights

  • Recent advances in gene expression microarray technology have opened up new opportunities for better treatment of diverse diseases [1,2,3]

  • The approbation of MammaPrintTM by U.S Food and Drug Administration (FDA) for clinical breast cancer prognosis [5] illustrated the promise of microarray technology in facilitating medical treatment in the future

  • The required minimum number of training samples varies with the complexity of different endpoints

Read more

Summary

Introduction

Recent advances in gene expression microarray technology have opened up new opportunities for better treatment of diverse diseases [1,2,3]. It helped with treatment selection to prolong survival time and improve life quality of cancer patients. The reliability of prediction results relied heavily on the accuracy of statistical parameters involved in microarray classifiers, which cannot be reliably estimated from a small number of training samples. It would help by collecting as many clinical samples as possible. Considering the fact that relatively rare clinical tissue samples can be used for transcriptional profiling, it is a challenge to estimate an appropriate number of training samples enough to achieve significant statistical power

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.