Abstract

False discovery rate (FDR) control is an important tool of statistical inference in feature selection. In mass spectrometry-based metabolomics data, features can be measured at different levels of reliability and false features are often detected in untargeted metabolite profiling as chemical and/or bioinformatics noise. The traditional false discovery rate methods treat all features equally, which can cause substantial loss of statistical power to detect differentially expressed features. We propose a reliability index for mass spectrometry-based metabolomics data with repeated measurements, which is quantified using a composite measure. We then present a new method to estimate the local false discovery rate (lfdr) that incorporates feature reliability. In simulations, our proposed method achieved better balance between sensitivity and controlling false discovery, as compared to traditional lfdr estimation. We applied our method to a real metabolomics dataset and were able to detect more differentially expressed metabolites that were biologically meaningful.

Highlights

  • The features are treated certain statistics or p-values are computed for each feature, and the false discovery rates are computed based on the estimation of the distribution of null density from the observed test statistics or p-values

  • (5) The expression levels of another 3000 pure noise metabolites were generated from the normal distribution with a standard deviation that equals the maximum noise SD as in step (3)

  • After obtaining its test statistic from the linear model and adjusting it together with all other metabolites, we found that low-density lipoprotein (LDL) is significant by the fdr2d method, with an lfdr value of 0.124

Read more

Summary

Introduction

The features are treated certain statistics or p-values are computed for each feature, and the false discovery rates are computed based on the estimation of the distribution of null density from the observed test statistics or p-values. When a false discovery rate procedure is applied to the test results of all genes, the low-read count genes mostly contribute to the null (non-differentially expressed) distribution. Involving both high-read count and low-read count genes in the FDR or lfdr procedure will reduce the significance level of all the genes. When pure noise features are present, they contribute to the null distribution, i.e. the uniform distribution in this case, and make all features less significant In this case, less than 100 features can be claimed significant at the FDR level of 0.2 (Fig. 1b)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call