Abstract

AbstractFollowing the quantitative turn in linguistics, the field appears to be in a methodological “wild west” state where much is possible and new frontiers are being explored, but there is relatively little guidance in terms of firm rules or conventions. In this article, we focus on the issue of variable selection in regression modeling. It is common to aim for a “minimal adequate model” and eliminate “non-significant” variables by statistical procedures. We advocate an alternative, “deductive modeling” approach that retains a “full” model of variables generated from our research questions and objectives. Comparing the statistical model to a camera, i.e., a tool to produce an image of reality, we contrast the deductive and predictive (minimal) modeling approaches on a dataset from a corpus study. While a minimal adequate model is more parsimonious, its selection procedure is blind to the research aim and may conceal relevant information. Deductive models, by contrast, are grounded in theory, have higher transparency (all relevant variables are reported) and potentially a greater accuracy of the reported effects. They are useful for answering research questions more directly, as they rely explicitly on prior knowledge and hypotheses, and allow for estimation and comparison across datasets.

Highlights

  • Following the quantitative turn in linguistics, the field appears to be in a methodological “wild west” state where much is possible and new frontiers are being explored, but there is relatively little guidance in terms of firm rules or conventions

  • This paper is intended to raise awareness about how regression modeling can be used as a “camera” to capture linguistic realities

  • When we analyze linguistic data by means of multivariate statistics, we must be aware that different strategies can lead to different results – so we should be clear about our strategy and not assume predictive modeling and “minimal adequate models” as a default

Read more

Summary

Linguists and statistical models: cowboys with cameras

Linguistics has been undergoing a methodological paradigm shift towards an increasingly quantitative discipline An increasing range of statistical methods is being introduced to the field, and while there is a spirit of novelty and exploration, linguistic research finds itself in a kind of “wild west” of quantitative methodology: there are many new prospects for (statistical) analysis, in the rather well-chartered territory of regression modeling and beyond, such as Correspondence Analysis (Glynn 2014; Greenacre 2007) or conditional inference trees (Gries 2020; Hothorn et al 2006; Levshina 2021); there are new methodological frontiers such as mixed effects regression (Gries 2015; Zuur et al 2009), generalized additive models (Winter and Wieling 2016; Wood 2017) or multidimensional scaling (Borg and Groenen 2005) These new opportunities might have brought along a gold rush, an unrealistic hope for incredible riches (valuable new findings, or perhaps just the gold nuggets of statistically significant effects) to be dug up from the new language data terrain. The photographer’s/ researcher’s task is to find an appropriate configuration of the camera/model, given the purpose and conditions at hand

Variable selection: adjusting the exposure level in our cameras
Deductive versus predictive modeling: a matter of research design
Case study
Deductive models versus backward variable selection
Comparability across models and visualization
Deductive and minimal models: summary and discussion
Further considerations
Concluding remarks
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call