Abstract

Psychometric theory requires unidimensionality (i.e., scale items should represent a common latent variable). One advocated approach to test unidimensionality within the Rasch model is to identify two item sets from a Principal Component Analysis (PCA) of residuals, estimate separate person measures based on the two item sets, compare the two estimates on a person-by-person basis using t-tests and determine the number of cases that differ significantly at the 0.05-level; if ≤5% of tests are significant, or the lower bound of a binomial 95% confidence interval (CI) of the observed proportion overlaps 5%, then it is suggested that strict unidimensionality can be inferred; otherwise the scale is multidimensional. Given its proposed significance and potential implications, this procedure needs detailed scrutiny. This paper explores the impact of sample size and method of estimating the 95% binomial CI upon conclusions according to recommended conventions. Normal approximation, “exact”, Wilson, Agresti-Coull, and Jeffreys binomial CIs were calculated for observed proportions of 0.06, 0.08 and 0.10 and sample sizes from n= 100 to n= 2500. Lower 95%CI boundaries were inspected regarding coverage of the 5% threshold. Results showed that all binomial 95% CIs included as well as excluded 5% as an effect of sample size for all three investigated proportions, except for the Wilson, Agresti-Coull, and JeffreysCIs, which did not include 5% for any sample size with a 10% observed proportion. The normal approximation CI was most sensitive to sample size. These data illustrate that the PCA/t-test protocol should be used and interpreted as any hypothesis testing procedure and is dependent on sample size as well as binomial CI estimation procedure. The PCA/t-test protocol should not be viewed as a “definite” test of unidimensionality and does not replace an integrated quantitative/qualitative interpretation based on an explicit variable definition in view of the perspective, context and purpose of measurement.

Highlights

  • Rating scales are one of the most commonly used methods of data collection across a range of disciplines such as behavioural, educational, social and health sciences

  • Young and coworkers [37] used the same methodology with a 17-item scale purported to measure self-efficacy with a sample of n = 309 people with multiple sclerosis and found 12.2% of the person measures from two Principal Component Analysis (PCA) derived item subsets to differ (95% binomial confidence interval (CI), 9.8% - 14.7%)

  • In addition to the influence of sample size, the results presented here illustrates that the choice of method for estimating the 95% binomial CI influences the results and conclusions from using the PCA/t-test protocol for testing unidimensionality in the Rasch model (RM)

Read more

Summary

Introduction

Rating scales are one of the most commonly used methods of data collection across a range of disciplines such as behavioural, educational, social and health sciences. Rating scales are rooted in the behavioural sciences and their typical purpose is to enable measurement of phenomena that cannot be directly observed and measured, i.e., latent variables. The measurement of such variables is of central and immense importance. Within the clinical health sciences rating scales are a prime mode of data collection in descriptive and associative studies as well as in clinical trials of therapeutic interventions. The quality of rating scales is at the heart of the quality of evidence-based practice and central to the quality of study results and decision-making [1]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call