Abstract

BackgroundFalse discovery rate (FDR) control is commonly accepted as the most appropriate error control in multiple hypothesis testing problems. The accuracy of FDR estimation depends on the accuracy of the estimation of p-values from each test and validity of the underlying assumptions of the distribution. However, in many practical testing problems such as in genomics, the p-values could be under-estimated or over-estimated for many known or unknown reasons. Consequently, FDR estimation would then be influenced and lose its veracity.ResultsWe propose a new extrapolative method called Constrained Regression Recalibration (ConReg-R) to recalibrate the empirical p-values by modeling their distribution to improve the FDR estimates. Our ConReg-R method is based on the observation that accurately estimated p-values from true null hypotheses follow uniform distribution and the observed distribution of p-values is indeed a mixture of distributions of p-values from true null hypotheses and true alternative hypotheses. Hence, ConReg-R recalibrates the observed p-values so that they exhibit the properties of an ideal empirical p-value distribution. The proportion of true null hypotheses (π0) and FDR are estimated after the recalibration.ConclusionsConReg-R provides an efficient way to improve the FDR estimates. It only requires the p-values from the tests and avoids permutation of the original test data. We demonstrate that the proposed method significantly improves FDR estimation on several gene expression datasets obtained from microarray and RNA-seq experiments.ReviewersThe manuscript was reviewed by Prof. Vladimir Kuznetsov, Prof. Philippe Broet, and Prof. Hongfang Liu (nominated by Prof. Yuriy Gusev).

Highlights

  • False discovery rate (FDR) control is commonly accepted as the most appropriate error control in multiple hypothesis testing problems

  • Appropriate FDR estimation depends on the precise p-values from each test and the validity of the underlying assumptions of the distribution

  • To address the above problems, we propose a novel extrapolative recalibration procedure called Constrained Regression Recalibration (ConReg-R) which models the empirical distribution of p-values in multiple hypothesis testing and recalibrates the imprecise pvalue calculation to better approximated p-values to improve the FDR estimation

Read more

Summary

Introduction

False discovery rate (FDR) control is commonly accepted as the most appropriate error control in multiple hypothesis testing problems. The accuracy of FDR estimation depends on the accuracy of the estimation of p-values from each test and validity of the underlying assumptions of the distribution. In high-throughput biological data analysis, multiple hypothesis testing is employed to address certain biological problems. Appropriate tests are chosen for the data, and the p-values are computed under some distributional assumptions. Due to the large number of tests performed, error rate controls (which focus on the occurrence of false positives) are commonly used to measure the statistical significance. False discovery rate (FDR) control is accepted as the most appropriate error control. Appropriate FDR estimation depends on the precise p-values from each test and the validity of the underlying assumptions of the distribution

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call