Abstract

Previous investigations into the validity of acceptability judgment data have focused almost exclusively on type I errors (or false positives) because of the consequences of such errors for syntactic theories (Sprouse & Almeida 2012; Sprouse et al. 2013). The current study complements these previous studies by systematically investigating the type II error rate (false negatives), or equivalently, the statistical power, of a wide cross-section of possible acceptability judgment experiments. Though type II errors have historically been assumed to be less costly than type I errors, the dynamics of scientific publishing mean that high type II error rates (i.e., studies with low statistical power) can lead to increases in type I error rates in a given field of study. We present a set of experiments and resampling simulations to estimate statistical power for four tasks (forced-choice, Likert scale, magnitude estimation, and yes-no), 50 effect sizes instantiated by real phenomena, sample sizes from 5 to 100 participants, and two approaches to statistical analysis (null hypothesis and Bayesian). Our goals are twofold (i) to provide a fuller picture of the status of acceptability judgment data in syntax, and (ii) to provide detailed information that syntacticians can use to design and evaluate the sensitivity of acceptability judgment experiments in their own research.

Highlights

  • Acceptability judgments form a substantial component of the empirical foundation of (­generative) syntactic theories (Chomsky 1965; Schütze 1996)

  • The result is a database of information regarding the rate of statistical detection that covers a substantial portion of possible experimental designs in syntax

  • We conducted a set of experiments and simulations to cover a wide range of possible experimental designs, fully crossing four acceptability judgment tasks, a set of 50 real phenomena that span a large portion of effect sizes in the literature, sample sizes from 5 to 100 participants, and two approaches to hypothesis testing

Read more

Summary

Introduction

Acceptability judgments form a substantial component of the empirical foundation of (­generative) syntactic theories (Chomsky 1965; Schütze 1996). Our goal in this article is to add one more critical piece of quantitative information to this growing body of knowledge: an empirical estimation of the sensitivity of formal acceptability judgment experiments in detecting theoretically interesting contrasts between different sentence types. We operationalize the notion of sensitivity by estimating and evaluating the rate of statistical detection of acceptability rating differences in a series of resampling simulations based on a large dataset of real pairwise comparisons where putatively real differences of different sizes exist. This rate of detection can be understood as an empirical estimate

Objectives
Methods
Findings
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.