Abstract
For practical and theoretical purposes, tests of second language (L2) ability commonly aim to measure one overarching trait, general language ability, while simultaneously measuring multiple sub-traits (e.g., reading, grammar, etc.). This tension between measuring uni- and multi-dimensional constructs concurrently can generate vociferous debate about the precise nature of the construct(s) being measured. In L2 testing, this tension is often addressed through the use of a higher-order factor model wherein multidimensional traits representing subskills load on a general ability latent trait. However, an alternative modeling framework that is currently uncommon in language testing, but gaining traction in other disciplines, is the bifactor model. The bifactor model hypothesizes a general factor, onto which all items load, and a series of orthogonal (uncorrelated) skill-specific grouping factors. The model is particularly valuable for evaluating the empirical plausibility of subscales and the practical impact of dimensionality assumptions on test scores. This paper compares a range of CFA model structures with the bifactor model in terms of theoretical implications and practical considerations, framed for the language testing audience. The models are illustrated using primary data from the British Council’s Aptis English test. The paper is intended to spearhead the uptake of the bifactor model within the cadre of measurement models used in L2 language testing.
Highlights
Dimensionality considerations are important for both the development and ongoing validation of tests of second language (L2) ability
All models have acceptable levels of fit on the absolute index, root mean squared error of approximation (RMSEA), but none of the models have acceptable levels of fit on the relative index of fit, the comparative fit index (CFI), though at 0.896, the bifactor model is very close to the suggested threshold of 0.9 for a reasonable model
In terms of statistical measures of comparative fit, the best fit is achieved by the bifactor model, followed by the correlated factors, the higher-order model, with the comparatively worst fit yielded by the unidimensional model
Summary
Dimensionality considerations are important for both the development and ongoing validation of tests of second language (L2) ability. Items are written with the aim of assessing these highly related but conceptually distinct abilities It is crucial for a strong validity argument that test constructors are able to isolate and examine the similarities and differences between various L2 skill areas. The meaningful evidencebased delineation and reporting of scales and possible subscales and their appropriate usage is Bifactor Model and Construct Dimensionality an essential aspect in making a construct validity argument for a test (Slocum-Gori and Zumbo, 2010). This has particular ramifications for practical decisions regarding score reporting. Achieving a balance between these concurrent theorizations can generate sometimes vociferous debate about the precise nature of, and relationship between, the construct(s) measured
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.