A comparative review of estimates of the proportion unchanged genes and the false discovery rate

Per Broberg

doi:10.1186/1471-2105-6-199

Abstract

BackgroundIn the analysis of microarray data one generally produces a vector of p-values that for each gene give the likelihood of obtaining equally strong evidence of change by pure chance. The distribution of these p-values is a mixture of two components corresponding to the changed genes and the unchanged ones. The focus of this article is how to estimate the proportion unchanged and the false discovery rate (FDR) and how to make inferences based on these concepts. Six published methods for estimating the proportion unchanged genes are reviewed, two alternatives are presented, and all are tested on both simulated and real data. All estimates but one make do without any parametric assumptions concerning the distributions of the p-values. Furthermore, the estimation and use of the FDR and the closely related q-value is illustrated with examples. Five published estimates of the FDR and one new are presented and tested. Implementations in R code are available.ResultsA simulation model based on the distribution of real microarray data plus two real data sets were used to assess the methods. The proposed alternative methods for estimating the proportion unchanged fared very well, and gave evidence of low bias and very low variance. Different methods perform well depending upon whether there are few or many regulated genes. Furthermore, the methods for estimating FDR showed a varying performance, and were sometimes misleading. The new method had a very low error.ConclusionThe concept of the q-value or false discovery rate is useful in practical research, despite some theoretical and practical shortcomings. However, it seems possible to challenge the performance of the published methods, and there is likely scope for further developing the estimates of the FDR. The new methods provide the scientist with more options to choose a suitable method for any particular experiment. The article advocates the use of the conjoint information regarding false positive and negative rates as well as the proportion unchanged when identifying changed genes.

Highlights

In the analysis of microarray data one generally produces a vector of p-values that for each gene give the likelihood of obtaining strong evidence of change by pure chance
1. the beta-uniform model (BUM) [10], which fits a mixture of a uniform and a beta distribution to the observed p-values; function ext.pi
5. the bootstrap least squares estimate [3], which is related to the previous estimate;function qvalue or estimatep0

Summary

Introduction

In the analysis of microarray data one generally produces a vector of p-values that for each gene give the likelihood of obtaining strong evidence of change by pure chance. The distribution of these p-values is a mixture of two components corresponding to the changed genes and the unchanged ones. The microarray technology permits the simultaneous measurement of the transcription of thousands of genes The analysis of such data has turned out to be quite a challenge. The proportion unchanged In the two-component model for the distribution of the test statistic the mixing parameter p0, which represents the proportion unchanged genes, is not estimable without strong distributional assumptions, see [1].

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Jan 1, 2005
Citations: 84	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

A comparative review of estimates of the proportion unchanged genes and the false discovery rate

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Rank-invariant resampling based estimation of false discovery rate for analysis of small sample microarray data
Nitin Jain ... Hyungjun Cho
BMC Bioinformatics | VOL. 6
Nitin Jain, et. al.Nitin Jain ... Hyungjun Cho
22 Jul 2005
BMC Bioinformatics | VOL. 6

Experimental and Statistical Considerations to Avoid False Conclusions in Proteomics Studies Using Differential In-gel Electrophoresis
Natasha A Karp ... Kathryn S Lilley
Molecular & Cellular Proteomics | VOL. 6
Natasha A Karp, et. al.Natasha A Karp ... Kathryn S Lilley
01 Aug 2007
Molecular & Cellular Proteomics | VOL. 6

Author response: Limitations of principal components in quantitative genetic association models for human studies
Yiqi Yao ... Alejandro Ochoa
-
Yiqi Yao, et. al.Yiqi Yao ... Alejandro Ochoa
25 Apr 2023
25 Apr 2023

Decision letter: Limitations of principal components in quantitative genetic association models for human studies
Magnus Nordborg ... Detlef Weigel
-
Magnus Nordborg, et. al.Magnus Nordborg ... Detlef Weigel
04 Jul 2022
04 Jul 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A comparative review of estimates of the proportion unchanged genes and the false discovery rate

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics