Abstract

BackgroundMolecular signatures are sets of genes, proteins, genetic variants or other variables that can be used as markers for a particular phenotype. Reliable signature discovery methods could yield valuable insight into cell biology and mechanisms of human disease. However, it is currently not clear how to control error rates such as the false discovery rate (FDR) in signature discovery. Moreover, signatures for cancer gene expression have been shown to be unstable, that is, difficult to replicate in independent studies, casting doubts on their reliability.ResultsWe demonstrate that with modern prediction methods, signatures that yield accurate predictions may still have a high FDR. Further, we show that even signatures with low FDR may fail to replicate in independent studies due to limited statistical power. Thus, neither stability nor predictive accuracy are relevant when FDR control is the primary goal. We therefore develop a general statistical hypothesis testing framework that for the first time provides FDR control for signature discovery. Our method is demonstrated to be correct in simulation studies. When applied to five cancer data sets, the method was able to discover molecular signatures with 5% FDR in three cases, while two data sets yielded no significant findings.ConclusionOur approach enables reliable discovery of molecular signatures from genome-wide data with current sample sizes. The statistical framework developed herein is potentially applicable to a wide range of prediction problems in bioinformatics.

Highlights

  • Molecular signatures are sets of genes, proteins, genetic variants or other variables that can be used as markers for a particular phenotype

  • We found that false discovery rate (FDR) as high as 50% did not degrade predictive accuracy discernably, provided that statistical power was sufficient (Figure 2)

  • Neither predictive accuracy nor stability constitute valid measures of FDR for molecular signatures

Read more

Summary

Introduction

Molecular signatures are sets of genes, proteins, genetic variants or other variables that can be used as markers for a particular phenotype. Molecular signatures are sets of genes, mRNA transcripts, proteins, genetic variants or other variables that can be used as markers for a particular cell or tissue phenotype, such as a cancerous or diabetic state. Signatures have dual uses, both as predictive models (3) and for discovery of molecular mechanisms (4) While it is well-known how to assess predictive accuracy (5), the method proposed is the first to control signature FDR (6), enabling reliably discovery

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.