Abstract

Most two-group statistical tests find broad patterns such as overall shifts in mean, median, or variance. These tests may not have enough power to detect effects in a small subset of samples, e.g., a drug that works well only on a few patients. We developed a novel statistical test targeting such effects relevant for clinical trials, biomarker discovery, feature selection, etc. We focused on finding meaningful associations in complex genetic diseases in gene expression, miRNA expression, and DNA methylation. Our test outperforms traditional statistical tests in simulated and experimental data and detects potentially disease-relevant genes with heterogeneous effects.

Highlights

  • Two-group statistical tests are widely used to characterize significant differences associated with an intervention or a condition

  • We show that our test is well calibrated and more powerful in detecting the aberration enrichment pattern compared to 11 other methods including widely used statistical tests, such as ttest and Limma, Wilcoxon, Levene, and KolmogorovSmirnov tests

  • For r = 1, we are essentially simulating cases that are mean shifted from the controls with no effect on the variance. Other scenarios, such as very low values of r (Additional file 1: Figure S1) and lower values of d ≤ 1 (Additional file 1: Figure S2 and S3), settings with imbalanced number of Simulating gene expression data In the previous section, we showed that our test was more powerful than a t-test, Wilcoxon test, and Levene test to detect the aberration enrichment pattern under the simplistic assumption of perturbed Gaussian when simulating a single variable at a time

Read more

Summary

Introduction

Two-group statistical tests are widely used to characterize significant differences associated with an intervention or a condition. If a gene is found to be differentially expressed (over-expressed or under-expressed) in the disease cases when compared to healthy controls, it can potentially be associated with the disease. The differentially expressed gene can be causal for the disease, in which case it can become a candidate for therapeutic intervention, or the association it can be non-causal: for example a compensatory or a downstream consequence of the disease state itself (immune reaction, treatment effect, etc.). Finding the differentially expressed genes often generates candidates that are further tested for their mechanistic involvement in the disease [1,2,3]. The typical approach for finding differentially expressed genes relies on statistical tests (e.g., Limma [4]) that look for a broad

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.