Abstract

Source-based writing, in which writers read or listen to academic content before writing, has been considered to better assess academic writing skills than independent writing tasks (Read, 1990; Weigle, 2004). Because scores resulting from ratings of test takers’ source-based writing task responses are treated as indicators of their academic writing ability, researchers have begun to investigate the meaning of scores on source-based academic writing tests in an attempt to define the construct measured on such tests. Although this research has resulted in insights about source-based writing constructs and the rating reliability of such tests, it has been limited in its research perspective, the methods for collecting data about the rating process, and the clarity of the connection between reliability and construct validity. This study aimed to collect and analyze evidence regarding the reliability and construct validity of a source-based academic English test for placement purposes, called the EPT Writing, and to show the relationship between these two parts of the study by presenting the evidence in a validity argument (Kane, 1992, 2006, 2013). Specifically, important reliability aspects, including the appropriateness of the rating rubric based on raters’ opinions and statistical evidence, the performance of the raters in terms of severity, consistency, and bias, as well as test score reliability, were examined. Also, the construct of academic source-based writing assessed by the EPT Writing was explored by analysis of the writing features that raters attended to while rating test takers’ responses. The study employed the mixed-methods multiphase research design (Creswell & Plano Clark, 2012) in which both quantitative and qualitative data were collected and analyzed in two sequential phases to address the research questions. In Phase 1, quantitative data, consisting of 1,300 operational ratings provided by the EPT Office, were analyzed using Many-Facets Rasch Measurement (MFRM) and Generalizability theory to address the research questions related to the rubric’s functionality, raters’ performance, and score reliability. In Phase 2, 630 experimental ratings, 90 stimulated recalls collected with assistance from records from eye-tracking technology, as well as nine interviews from nine raters were analyzed to address the research questions pertaining to raters’ opinions of the rubric and the writing features that attracted raters’ attention during rating. The findings were presented in a validity argument to show the connection between the reliability of the ratings and the construct validity, which needs to be taken into account in research on rating processes. Overall, the raters’ interviews

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.