Sample size planning for developing classifiers using high-dimensional DNA microarray data

K K Dobbin,R M Simon

doi:10.1093/biostatistics/kxj036

Abstract

Many gene expression studies attempt to develop a predictor of pre-defined diagnostic or prognostic classes. If the classes are similar biologically, then the number of genes that are differentially expressed between the classes is likely to be small compared to the total number of genes measured. This motivates a two-step process for predictor development, a subset of differentially expressed genes is selected for use in the predictor and then the predictor constructed from these. Both these steps will introduce variability into the resulting classifier, so both must be incorporated in sample size estimation. We introduce a methodology for sample size determination for prediction in the context of high-dimensional data that captures variability in both steps of predictor development. The methodology is based on a parametric probability model, but permits sample size computations to be carried out in a practical manner without extensive requirements for preliminary data. We find that many prediction problems do not require a large training set of arrays for classifier development.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Sample size planning for developing classifiers using high-dimensional DNA microarray data

Abstract

Talk to us

Similar Papers

More From: Biostatistics

Lead the way for us

Journal: Biostatistics	Publication Date: Apr 13, 2006
Citations: 131

Similar Papers

Clinical Trials in Orthopaedics Research. Part III. Overcoming Operational Challenges in the Design and Conduct of Randomized Clinical Trials in Orthopaedic Surgery
Elena Losina ... Jeffrey N Katz
Journal of Bone and Joint Surgery | VOL. 94
Elena Losina, et. al.Elena Losina ... Jeffrey N Katz
21 Mar 2012
Journal of Bone and Joint Surgery | VOL. 94

Sample size estimation for biomechanical waveforms: Current practice, recommendations and a comparison to discrete power analysis
Mark A Robinson ... Todd C Pataky
Journal of Biomechanics | VOL. 122
Mark A Robinson, et. al.Mark A Robinson ... Todd C Pataky
23 Apr 2021
Journal of Biomechanics | VOL. 122

Blinded Sample Size Re-estimation for Longitudinal Overdispersed Count Data in Randomized Clinical Trials with an Application in Multiple Sclerosis
Thomas Asendorf
-
Thomas AsendorfThomas Asendorf
21 Feb 2022
21 Feb 2022

RnaSeqSampleSize: real data based sample size estimation for RNA sequencing
Shilin Zhao ... Quanhu Sheng
BMC Bioinformatics | VOL. 19
Shilin Zhao, et. al.Shilin Zhao ... Quanhu Sheng
30 May 2018
BMC Bioinformatics | VOL. 19

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Sample size planning for developing classifiers using high-dimensional DNA microarray data

Abstract

Talk to us

Similar Papers

More From: Biostatistics