Robustness of statistical methods when measure is affected by ceiling and/or floor effect.

Matúš Šimkovic,Birgit Träuble,Alan D Hutson

doi:10.1371/journal.pone.0220889

Matúš Šimkovic, Birgit Träuble + Show 1 more

Open Access

https://doi.org/10.1371/journal.pone.0220889

Copy DOI

Journal: PloS one	Publication Date: Aug 19, 2019
Citations: 74	License type: CC BY 4.0

Affiliation: University of Cologne

Abstract

A simulation study investigated how ceiling and floor effect (CFE) affect the performance of Welch's t-test, F-test, Mann-Whitney test, Kruskal-Wallis test, Scheirer-Ray-Hare-test, trimmed t-test, Bayesian t-test, and the "two one-sided tests" equivalence testing procedure. The effect of CFE on the estimate of group difference and on its confidence interval, and on Cohen's d and on its confidence interval was also evaluated. In addition, the parametric methods were applied to data transformed with log or logit function and the performance was evaluated. The notion of essential maximum from abstract measurement theory is used to formally define CFE and the principle of maximum entropy was used to derive probability distributions with essential maximum/minimum. These distributions allow the manipulation of the magnitude of CFE through a parameter. Beta, Gamma, Beta prime and Beta-binomial distributions were obtained in this way with the CFE parameter corresponding to the logarithm of the geometric mean. Wald distribution and ordered logistic regression were also included in the study due to their measure-theoretic connection to CFE, even though these models lack essential minimum/maximum. Performance in two-group, three-group and 2 × 2 factor design scenarios was investigated by fixing the group differences in terms of CFE parameter and by adjusting the base level of CFE. In general, bias and uncertainty increased with CFE. Most problematic were occasional instances of biased inference which became more certain and more biased as the magnitude of CFE increased. The bias affected the estimate of group difference, the estimate of Cohen's d and the decisions of the equivalence testing methods. Statistical methods worked best with transformed data, albeit this depended on the match between the choice of transformation and the type of CFE. Log transform worked well with Gamma and Beta prime distribution while logit transform worked well with Beta distribution. Rank-based tests showed best performance with discrete data, but it was demonstrated that even there a model derived with measurement-theoretic principles may show superior performance. Trimmed t-test showed poor performance. In the factor design, CFE prevented the detection of main effects as well as the detection of interaction. Irrespective of CFE, F-test misidentified main effects and interactions on multiple occasions. Five different constellations of main effect and interactions were investigated for each probability distribution, and weaknesses of each statistical method were identified and reported. As part of the discussion, the use of generalized linear models based on abstract measurement theory is recommended to counter CFE. Furthermore, the necessity of measure validation/calibration studies to obtain the necessary knowledge of CFE to design and select an appropriate statistical tool, is stressed.

Highlights

In 2008, Schnall investigated how participants rate moral dilemmas after they have been presented with words related to the topic of cleanliness, as opposed to neutral words [1], [2]
Is a quick overview of the variety of the suggested analyses: [3] showed that the mean ratings in the replication study were significantly higher than those in the original study. She showed that the proportion of the most extreme ratings on the 10 point scale was significantly higher in the replication study than in the original study. [4] argued that rank-based MannWhitney test provides results that are identical to an analysis with Analysis of Variance (ANOVA)
In section 1.1.3 we review the derivation of maximum entropy distributions, which provides an extension of measurement theory to random variables and in particular allow us to derive distributions, that can be used to simulate ceiling and floor effect (CFE) and to manipulate its magnitude

Summary

Introduction

In 2008, Schnall investigated how participants rate moral dilemmas after they have been presented with words related to the topic of cleanliness, as opposed to neutral words [1], [2]. Is a quick overview of the variety of the suggested analyses: [3] showed that the mean ratings in the replication study were significantly higher than those in the original study. [7] investigated how ceiling effects would affect the power of a t-test He used a graded response model to simulate data that were affected by ceiling, similar to those obtained in the replication study. The effect size was set to a value obtained in the original study He found that, depending on the model parametrization, the power of a t-test in the simulated replication study ranges from 70 to 84% which should be sufficient to detect the effect. [8] performed Bayes Factor analysis and compared the quantiles Both analyses suggested an absence of an effect in the replication study

Objectives

Methods

Results

Conclusion