Permutation – based statistical tests for multiple hypotheses

Anyela Camargo,Francisco Azuaje,Haiying Wang,Huiru Zheng

doi:10.1186/1751-0473-3-15

Abstract

BackgroundGenomics and proteomics analyses regularly involve the simultaneous test of hundreds of hypotheses, either on numerical or categorical data. To correct for the occurrence of false positives, validation tests based on multiple testing correction, such as Bonferroni and Benjamini and Hochberg, and re-sampling, such as permutation tests, are frequently used. Despite the known power of permutation-based tests, most available tools offer such tests for either t-test or ANOVA only. Less attention has been given to tests for categorical data, such as the Chi-square. This project takes a first step by developing an open-source software tool, Ptest, that addresses the need to offer public software tools incorporating these and other statistical tests with options for correcting for multiple hypotheses.ResultsThis study developed a public-domain, user-friendly software whose purpose was twofold: first, to estimate test statistics for categorical and numerical data; and second, to validate the significance of the test statistics via Bonferroni, Benjamini and Hochberg, and a permutation test of numerical and categorical data. The tool allows the calculation of Chi-square test for categorical data, and ANOVA test, Bartlett's test and t-test for paired and unpaired data. Once a test statistic is calculated, Bonferroni, Benjamini and Hochberg, and a permutation tests are implemented, independently, to control for Type I errors. An evaluation of the software using different public data sets is reported, which illustrates the power of permutation tests for multiple hypotheses assessment and for controlling the rate of Type I errors.ConclusionThe analytical options offered by the software can be applied to support a significant spectrum of hypothesis testing tasks in functional genomics, using both numerical and categorical data.

Highlights

Genomics and proteomics analyses regularly involve the simultaneous test of hundreds of hypotheses, either on numerical or categorical data
Current statistical inference problems in areas such as genomics and proteomics regularly involve the simultaneous test of hundreds of null hypotheses
The permutation test identified more features than Benjamini and Hochberg (B&H): 153 Single nucleotide polymorphisms (SNP) with significant P-values. These results are consistent with the results reported by Carlson, et al (2003) [16], which found that only 48% of the SNPs were shared by African-Americans and European-Americans

Summary

Results

To illustrate some of the advantages of using the permutation-based test for multiple hypotheses validation, this section summarises examples of analyses using publicly available data. A fourth analysis implemented the ANOVA test to estimate the potential statistical significant difference between the means of three (normally distributed) experimental groups Samples in this data set were obtained from heart tissue of healthy donors, as well as from donors suffering from either dilated or ischemic cardiomyopathy [see Additional file 7]. The null hypothesis of this ANOVA analysis was that there were no differences between the means of the three groups, and the significance level to reject the null hypothesis was set to 0.05 In this case, the raw P-values of 6371 genes were under the significance level (P < 0.05), 3331 genes were under the significance level after correcting with B&H, and only nine genes were under the significance level after correcting with Bonferroni. Perhaps this analysis showed the real strength that the permutation test has to identify potential biomarkers of disease

Background

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Source Code for Biology and Medicine	Publication Date: Oct 21, 2008
Citations: 104	License type: cc-by

R Discovery Prime

R Discovery Prime

Permutation – based statistical tests for multiple hypotheses

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Source Code for Biology and Medicine

Lead the way for us

Similar Papers

Categorical Data: Need, Encoding, Selection of Encoding Method and Its Emergence in Machine Learning Models—A Practical Review Study on Heart Disease Prediction Dataset Using Pearson Correlation
Nishoak Kosaraju ... Sainath Reddy Sankepally
-
Nishoak Kosaraju, et. al.Nishoak Kosaraju ... Sainath Reddy Sankepally
01 Jan 2023
01 Jan 2023

Subspace Clustering of Categorical and Numerical Data With an Unknown Number of Clusters.
Hong Jia ... Yiu-Ming Cheung
IEEE Transactions on Neural Networks and Learning Systems | VOL. 29
Hong Jia, et. al.Hong Jia ... Yiu-Ming Cheung
03 Aug 2017
IEEE Transactions on Neural Networks and Learning Systems | VOL. 29

Fuzzy case‐based‐reasoning‐based imputation for incomplete data in software engineering repositories
Ibtissam Abnane ... Ali Idri
Journal of Software: Evolution and Process | VOL. 32
Ibtissam Abnane, et. al.Ibtissam Abnane ... Ali Idri
16 Mar 2020
Journal of Software: Evolution and Process | VOL. 32

The influence of feature grouping algorithm in outlier detection with categorical data
Sharon Femi Paul Sunder Nathaniel ... Rajalakshmi Viswanathan
Acta Scientiarum. Technology | VOL. 46
Sharon Femi Paul Sunder Nathaniel, et. al.Sharon Femi Paul Sunder Nathaniel ... Rajalakshmi Viswanathan
17 Apr 2024
Acta Scientiarum. Technology | VOL. 46

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Permutation – based statistical tests for multiple hypotheses

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Source Code for Biology and Medicine