Abstract
This study was designed to determine how well existing analytic rating scales functioned in the assessment of low- to mid-proficiency Japanese university students’ interactive English speaking ability when engaged in small group discussions. Many-facet Rasch measurement (MFRM) was employed to evaluate the quality of adapted rating scales for complexity, accuracy, and fluency (CAF), interaction, and communicative effectiveness. The video-recorded performances of 64 participants who completed 10-min group discussion tasks at the beginning and end of their first semester of university study were independently rated by four experienced raters using 9-point rating scales and the resulting scores were subjected to many-facet Rasch measurement (MFRM). Although the scores demonstrated acceptable fit to the Rasch model, closer inspection of the data using Linacre’s (J Appl Meas 3:85–106, 2002a) guidelines for post hoc evaluation of rating scale category quality revealed multiple problems with the 9-point scales and suggested four major revisions were likely to improve the scales for use in this context. The resulting five 5-point rating scales developed through these revisions were then used by the same raters to reassess the same task performances. The 5-point rating scale data was then subjected to the same manner of MFRM analyses and found to demonstrate notably improved functioning and quality.
Highlights
This study was designed to determine how well existing analytic rating scales functioned in the assessment of low- to mid-proficiency Japanese university students’ interactive English speaking ability when engaged in small group discussions
Analytic rating scales are widely used in Second language (L2) performance assessments, their many benefits cannot be merely assumed from their use alone, especially when employed in high-stakes testing situations or fine-grained research studies
To more explicitly situate the results reported so far in relation to the first research question: the fit and functioning of the participant and rater facets were found to be productive for measurement, close inspection of the rating scale facet revealed six problems with the functioning of the 9-point rating scales
Summary
Participants Sixty-four first-year Japanese university students from four intact classes at a private women’s college in western Japan participated in this study. As a result of this coding procedure, a total of 128 distinct participant codes were considered for the MFRM analysis using Linacre’s FACETS computer software (version 3.68.1) Both the rater and rating criteria facets were centered and the participant facet was unconstrained. Through an inspection of the absolute values for the standardized residuals (i.e., the standardized differences between observed and expected ratings), the data from the initial 9-point ratings were found to meet Linacre’s (2017a) model-fit stipulations that less than about 5% be greater than or equal to 2.0 and about 1% or less be greater than or equal to 3.0.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have