Abstract

Corpus-based studies of learner language and (especially) English varieties have become more quantitative in nature and increasingly use regression-based methods and classifiers such as classification trees, random forests, etc. One recent development that is becoming more widely used is the MuPDAR (Multifactorial Prediction and Deviation Analysis using Regressions) approach of Gries and Deshors (2014) and Gries and Adelman (2014). This approach attempts to improve on traditional regression- or tree-based approaches by, firstly, training a model/classifier on the reference speakers (often native speakers in learner corpus studies or British English speakers in variety studies), then, secondly, using this model/classifier to predict what such a reference speaker would produce in the situation the target speaker is in (often non-native speakers or indigenized-variety speakers). The third step then consists of determining whether the target speakers made a canonical choice or not and explore that variability with a second regression model or classifier. The present paper is a follow-up to Gries and Deshors’s (2020) and offers additional answers to a variety of questions that readers and audiences to MuPDAR presentations have been raising for a few years. First, I show how MuPDAR can be extended straightforwardly to alternations that involve more than the typically used binary choices; I do so in a way that also addresses another potential challenge and exemplify this with a case study from varieties research. Second, I outline a casewise-similarity approach towards predicting what reference speakers would do that avoids frequent regression modeling problems and exemplify, as well as compare, it to competing alternatives with a case study from learner corpus research.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.