Rank-invariant resampling based estimation of false discovery rate for analysis of small sample microarray data

Nitin Jain,Hyungjun Cho,Jae K Lee,Michael O'Connell

doi:10.1186/1471-2105-6-187

Abstract

BackgroundThe evaluation of statistical significance has become a critical process in identifying differentially expressed genes in microarray studies. Classical p-value adjustment methods for multiple comparisons such as family-wise error rate (FWER) have been found to be too conservative in analyzing large-screening microarray data, and the False Discovery Rate (FDR), the expected proportion of false positives among all positives, has been recently suggested as an alternative for controlling false positives. Several statistical approaches have been used to estimate and control FDR, but these may not provide reliable FDR estimation when applied to microarray data sets with a small number of replicates.ResultsWe propose a rank-invariant resampling (RIR) based approach to FDR evaluation. Our proposed method generates a biologically relevant null distribution, which maintains similar variability to observed microarray data. We compare the performance of our RIR-based FDR estimation with that of four other popular methods. Our approach outperforms the other methods both in simulated and real microarray data.ConclusionWe found that the SAM's random shuffling and SPLOSH approaches were liberal and the other two theoretical methods were too conservative while our RIR approach provided more accurate FDR estimation than the other approaches.

Highlights

The evaluation of statistical significance has become a critical process in identifying differentially expressed genes in microarray studies
In order to control such a false-positive rate, traditional statistical methods often control the family-wise error rate (FWER), the probability of incorrectly accepting at least one false-positive hypothesis among all hypotheses; for example, the commonly-used Bonferroni correction divides the type I error α by the total number of hypotheses for the test of each gene's differential expression, assuming the hypotheses under consideration are
In order to overcome these restrictions, we propose a rankinvariant resampling (RIR) approach to False Discovery Rate (FDR) estimation, especially for microarray data with a small number of replicates

Summary

Introduction

The evaluation of statistical significance has become a critical process in identifying differentially expressed genes in microarray studies. BMC Bioinformatics 2005, 6:187 http://www.biomedcentral.com/1471-2105/6/187 independent [1] This independence assumption is unlikely to be true in microarray data, as functions of many genes are interrelated in varying degrees. Several authors (e.g., Sidak, WestFall and Young) have developed step-down procedures that apply the severe Bonferroni correction only to the most extreme value of the test statistic, and step down the correction with the value of the test statistic. These methods still result in high false-negative error, likely missing many genes that are truly differentially expressed

Methods

Results

Conclusion