Abstract

Sir, In their article, Khan et al. (1) used the Mann–Whitney test and the Kruskal–Wallis test to compare groups, because they claim that normality was not satisfied. However, in their comparisons between the groups, they are in fact comparing the means of the concentrations. This interest in concentrations is shown in the subtitles used, and throughout the text. The problem is that the Mann–Whitney test and the Kruskal–Wallis test compare the entire distributions associated with the clinical groups, and not just their means. For example, two distributions are unequal if their means are the same but not their variances. Hence Khan et al. are not entitled to conclude that concentration levels are significantly different when the results of the Mann–Whitney test or the Kruskal–Wallis test are significant. To compare the peritoneal fluid (PF) concentration levels between the various clinical groups, it is necessary to compare their mean concentrations. The Central Limit Theorem justifies the assumption of normality for a sample of observations on a variable, when making inferences on the mean of the variable. Khan et al. made this assumption when they used the analysis of variance and the analysis of covariance. They should also have used the assumption of normality instead of resorting to the Mann–Whitney test or the Kruskal–Wallis test, as it is the mean PF concentration levels they are interested in. The problem with the analysis of variance, analysis of covariance or two-sample t-test (which they may consider) is that these assume that the unknown variances of the normal distributions whose means are being compared are equal. As unknown variances do not have to be equal, incorrectly assuming that variances are homogeneous can cause errors. This is not avoided by testing for the equality of variances, particularly because tests on variances are well known to be unreliable (2). Problems caused by incorrectly assuming that variances are homogeneous have already been pointed out (3). Nor does escaping this problem by using nonparametric tests such as the Mann–Whitney test or the Kruskal–Wallis test help. Apart from the fact, pointed out above, that they do not say anything about the means if significant, they are known to be biased to one side in a two-sided alternative (4). The problem of comparing the means of normal populations with unknown (and not assumed equal) variances at exact significance levels is the well-known Behrens–Fisher problem. I have found its solution in its generalized form, in the frequentist sense (5). This means that the Tsakok solution can compare the means of normal populations at exact unconditional significance levels, without realistically having to assume that their unknown variances are equal. The Tsakok solution has moreover been shown (5) to be more sensitive to differences in means even if the unknown variances are in fact equal. The software General Statistical Package (GSP) is used to implement the Tsakok technique. Applying it to the data of Khan et al., it is found that there are significant different differences in the PF concentrations of interleukin-6 (IL-6) between endometriosis (–) and Stage III–IV (Table II), the PF concentrations of estradiol (E2) between secretory-endometriosis (–) and secretory-endometriosis (+) (Table IV), the PF concentrations of progesterone (P) between secretory-endometriosis (–) and secretory-endometriosis (+) (Table IV) and between proliferative-endometriosis (+) and secretory-endometriosis (+) (Table IV). Each comparison was carried out at 0.02 (2 decimal places) significance level. There is far too little overlap between the 99% confidence intervals of the means for each comparison. In Table III, the GSP finds that there is, at 0.02 (2 decimal places) significance level, no significant difference in the PF concentrations of E2 between red lesions and black and white lesions. This illustrates the fact that if two distributions are different, this does not necessarily mean that their means are significantly different. It is not true that “estradiol … levels were significantly higher in women containing red pigments than in women containing other pigments.” There is considerable overlap between the 99% confidence intervals of the means of the two clinical subgroups. The failure to detect some significant differences in the PF concentrations in Tables II and IV and the incorrect claim of significant PF concentrations in Table III are just some examples of the consequences of using inappropriate statistical tests. The data need to be reanalyzed and the conclusions reexamined. Using my article (6), which shows how to construct exact unconditional Uniformly Most Powerful Unbiased tests, the Tsakok technique can be extended to compare samples nonparametrically. As such, it supersedes tests such as the Mann–Whitney test or the Kruskal–Wallis test. There is an indication (7) that the technique can be applied to dependent samples. The Tsakok articles are reprinted (8) with further results.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call