A large-scale high-stakes university admission exam: pre- and trans-pandemic comparison
This study analyzes performance on a large-scale university admission exam before and during the pandemic, finding a significant increase in scores in 2020 likely due to increased study time, with no evidence of learning loss, and highlighting gender disparities and the need for long-term impact assessment.
ABSTRACT This study describes the performance of students in the admission exam to a large public university, before and during the pandemic. The instrument had 120 multiple-choice items, with ample validity evidence. It was administered in person, paper-and-pencil. The exam had satisfactory psychometric properties. Analysis of 616,992 test takers’ scores showed a significant increase in average correct responses in 2020 (48.2%) compared to 2019 (45.8%; p < 0.001, d = 0.148), with a return to near-baseline performance in 2021 (46.7%) and 2022 (45.7%). Male candidates had higher scores than females (48.4% vs 45.2%; p < 0.001), and those applying to STEM fields achieved the highest results. The data show no evidence of pandemic-related learning loss. An initial performance boost occurred, likely due to increased study time and exam preparation during lockdown, followed by stabilization. The study underscores the need to further investigate gender disparities and to track long-term educational effects post-pandemic.
- Research Article
1
- 10.1016/j.asw.2022.100626
- Jul 1, 2022
- Assessing Writing
Examining the social consequences of a locally-developed placement test using test takers’ attitudes
- Research Article
1
- 10.17154/kjal.2016.9.32.3.51
- Sep 30, 2016
- Korean Journal of Applied Linguistics
High stakes English speaking tests have been used for evaluating applicants’ language proficiency in academic contexts or in the work place. In these situations, most Korean college students prepare for tests such as TOEIC Speaking or OPIc test in order to get better jobs. Although high-stakes tests pressure students and teachers to raise scores, there is little research about how test takers prepare for English speaking tests in cram schools. One aim of this study was to examine and compare TOEIC Speaking and OPIc test preparation activities in a cram school. Also, I aimed to investigate why and how test takers choose test preparation, the characteristics of test preparation, and how they are influenced by test preparation from the test-takers’ perspectives. Various types of data were collected through observation, interview and documentation. The results indicated that activities in the cram school mode were not significantly influenced by test type. Thus, test takers’ activities may be influenced by teacher’s instruction. The study raises awareness of the importance of instruction by a teacher for appropriate test preparation, and its impact on a test taker in the field of language teaching and testing.
- Research Article
1
- 10.22099/jtls.2018.28165.2440
- Mar 1, 2018
- Journal of Teaching Language Skills
Test washback is held to be complicated and multifaceted in that a host of cultural, social, individual, test, and institutional factors are involved in shaping it. Thus far, the majority of washback studies have had as their focus the role of teachers in test washback or washback to teachers. How educational environments or institutions might function in isolation or in interaction with other factors in shaping washback to the learners and test takers has not received adequate research attention. The current study examined the mediatory role of academic institutions in washback to learners' perceptions of test content and test preparation. To this aim, 86 senior English students from two universities, one a top tier and the other a low tier one, completed two questionnaires: one on test takers' preparation practices including test analysis, test taking skills, drilling target skills, and socio-affective strategies; and the other on test takers' construal of test demands and uses as well as their expectation of success on the test. The data analyzed through partial least squares structural equation modeling revealed that a washback model based on expectancy-value theory explains a moderate amount of variance in test preparation. Further, for test takers from the low tier university, favorable perceptions of test content were associated with more value placed on test taking. However, Multi-group analysis pointed to group-invariance of the model across the two institutions, indicating a lack of strong evidence for the mediatory role of educational environments in washback to test takers’ perceptions and preparation.
- Research Article
1
- 10.25073/2525-2445/vnufs.4280
- Aug 2, 2018
- VNU Journal of Foreign Studies
Ten aspects of test content in the two listening tests: IELTS and TOEFL iBT are investigated from the perspective of test-takers’ judgment. Main findings reveal that there are both similarities and differences in test takers’ attitudes to the two tests although the similarities outweigh the differences. The most obvious difference is that test takers have a more positive attitude to the IELTS listening test than to the TOEFL iBT listening test and test preparation has a strong effect to test takers’ attitude to the test. In addition, test takers’ positive attitudes to the test are strongly associated with better test performance. Substantial differences of test takers’ attitude to the two listening tests can be seen in their judgment of difficulty level, new words/technical terms and familiarity of topics. Test takers found the IELTS listening test less difficult, having fewer new words and technical terms, and containing more familiar topics than the TOEFL iBT listening test. They also find the test method of the IELTS is less challenging than that of the TOEFL iBT listening test although their choice of the test to take heavily depends on which test they are being prepared for.
- Research Article
23
- 10.1002/ets2.12145
- Apr 20, 2017
- ETS Research Report Series
Language test preparation has often been studied within the consequential validity framework in relation to ethics, equity, fairness, and washback of assessment. The use of independent and integrated speaking tasks in the TOEFL iBT® test represents a significant development and innovation in assessing speaking ability in academic contexts. Integrated tasks that involve synthesizing and summarizing information presented in reading and listening materials have the potential to generate new test preparation strategies. This study investigated the experiences of over 1,500 Chinese test takers and 23 teachers who were preparing for the TOEFL iBT speaking tasks. It examined the frequency of use of a number of different test preparation activities and materials, reasons, and expectations for taking preparation courses and the features of preparation courses. In addition, we examined the usefulness of test preparation from two perspectives: students' and teachers' perceptions as well as the relationship between test preparation and performance. Data were collected via questionnaires, focus group discussions, interviews with test takers and teachers, and classroom observations. The data showed that (a) test preparation was a hugely complex, multiple‐components construct, and teaching and learning test‐taking strategies compose the most prominent feature of intensive preparation courses; (b) there were significant age‐related differences in students' preparation activities and focuses, although with small effect sizes; (c) there was a high agreement between teachers and students in their views on the usefulness of test preparation activities; and (d) there existed only a weak relationship between test preparation and performance. The only significant predictor of students' test performance was the frequency of their use of the TOEFL Practice Online TPO® practice tests. The findings of the study can enhance our understanding of the pedagogical practices that characterize test preparation programs and contribute to the ongoing validity argument for the TOEFL iBT Speaking test. The implications of the findings for test publishers, test takers, teachers, and test preparation schools are discussed with reference to the instructional, learning, and affective aspects of the multifaceted construct of test preparation.
- Research Article
6
- 10.1080/02602938.2020.1773392
- Jun 4, 2020
- Assessment & Evaluation in Higher Education
Of the many possible institutional and individual factors bearing on test preparation, one is how individuals go about choosing their achievement goals. Yet, the literature on the relationship between the two phenomena remains slim. The objectives of this study are twofold. First, it explores the range of test preparation practices exercised by test takers in preparing for the English module of the Higher Education Admission Test in Iran. Secondly, it investigates how individual goal orientations mediate test preparation. A goal orientations scale was translated, validated and administered to the participants, who were 357 test candidates, a convenience sample. The participants also completed a test preparation questionnaire with two underlying factors including desired and undesired test preparation practices. Descriptive statistics and paired samples t-tests revealed that preparation for the Higher Education Admission Test entailed a mix of both detrimental and beneficial practices, with the frequency of the former being significantly higher. It was also revealed that mastery goal orientations are associated with educationally defensible test preparation practices. Findings carry implications for testers, test preparation instructors and educational policy makers.
- Research Article
3
- 10.1186/s40468-024-00277-1
- Feb 26, 2024
- Language Testing in Asia
The present study explored the comparability in performance scores between the computer-delivered and face-to-face modes for the two speaking tests in the Vietnamese Standardized Test of English Proficiency (VSTEP) (the VSTEP.2 and VSTEP.3–5 Speaking tests) according to Vietnam’s Six-Level Foreign Language Proficiency Framework (VNFLPF) and test takers’ experiences. Data were collected from 75 and 82 VSTEP.2 and VSTEP.3–5 university English-majored test takers respectively in both computer-delivered and face-to-face conditions. A counterbalanced research design was adopted to minimise mode order effects. After test completion, 30 of the test takers, 15 from each proficiency test, were interviewed in the focus group format of 3–4 members per group. The results indicated mixed, selective effects of the testing mode. Overall, test scores were comparable in the VSTEP.2 Speaking test but significantly higher in favour of the face-to-face mode for the VSTEP.3–5 Speaking test. However, the statistically significant difference was observed in only one measure of the many analytical criteria (content development in the former test, and pronunciation in the latter test) with mixed mode advantages. The interview data has provided rich refreshing insights into how test takers viewed each testing mode against real-life communication. Their experiences further revealed a wide range of affective preferences involved in the inherent affordances or constraints of each testing mode and their communication and performance/outcome orientation. The findings offer important implications for extrapolation, test preparation and administration, and test taker/rater training in the particular context of the two English speaking proficiency tests in Vietnam and perhaps beyond.
- Research Article
2
- 10.3389/fpsyg.2022.846413
- Mar 7, 2022
- Frontiers in Psychology
Of the many possible individual factors bearing on test preparation, one is how individuals’ motivational and cognitive perceptions affect test-driven preparation practices. This study reports an investigation into test preparation of a high-stakes writing test from the perspective of expectancy-value theory. Undergraduate students (n = 623) on their test preparation for the writing tasks of China’s Graduate School Entrance English Examination (GSEEE) were recruited voluntarily from 11 universities in mainland China. The perceptions of GSEEE test takers, which included goal, task value, task demand, and expectation of success, were identified. Five types of preparation practices were identified for the GSEEE writing tasks: memorizing practice, test familiarization, comprehensive learning, skills development and drilling practice. Structural equation modeling revealed that the expectancy-value model held up well for the paths from test takers’ perceptions to test-driven preparation practices, which were not construct-oriented but goal-motivated. The GSEEE test takers’ goal, determined by the high-stakes nature of admission test, explained their motivation and determined their behavior toward test preparation. Results also indicated that task demand was inadequate to be termed a strong factor in affecting test preparation. As such, the findings of this study offer evidence regarding how an expectancy-value model fit into test preparation mechanism and provide insights into the nature and scope of test preparation for high-stakes writing tests.
- Research Article
23
- 10.1177/0265532220927407
- Jun 18, 2020
- Language Testing
A key concern of washback research in language testing is with the value of test preparation for facilitating learning and improving test performance. Although test takers may draw on a wide range of preparation activities, the majority of research studies examining test preparation have taken place in classroom settings, leaving self-access approaches largely unexamined. The aim of the current study was to (a) explore possible links between self-access test preparation activities and improved test performance and (b) examine how repeat test takers adjust their test preparation activities from test sitting to test sitting while preparing for the Pearson Test of English (Academic). The study involved the collection and analysis of interviews from 60 recent repeat test takers. The interview data were coded for themes and sub-themes and analyzed for the kind of test preparation activities in which learners engaged, and how these changed over time. The interviews showed that the test takers were strategic in their preparation, by changing their approaches depending on their previous test results. The largest number of significant improvements was identified for speaking, where test takers engaged in a variety of strategies, some of which were construct-irrelevant. The findings are discussed in relation to test validity and washback.
- Research Article
6
- 10.1002/j.2330-8516.1983.tb00022.x
- Dec 1, 1983
- ETS Research Report Series
ABSTRACTThis study sought to examine the relationship between five methods of test preparation and test performance as measured by Graduate Management Admission Test (GMAT) Verbal (V), Quantitative (Q) and Total (T) scores. Data on method of test preparation were obtained through voluntary examinee response to the following five questions which appeared on the answer sheets:In preparing for this test, did you: Study the sample questions in the GMAT registration bulletin? Work through an actual GMAT published by ETS? Use a book not published by ETS on how to prepare for the GMAT? Attend a test preparation or coaching course for the GMAT? Undertake on your own any review of mathematics? One sample of first‐time test takers and one sample of second‐time test takers were selected from among the 185,525 1981–82 GMAT examinees who were U.S. citizens. Multiple regressions using GMAT scores as dependent variables and test preparation, undergraduate grade point average (UGPA) and sex as independent variables were computed separately for first‐time examinees who were members of the Afro‐American/Black, Caucasion/White, Oriental/Asian and Spanish‐American U.S. citizen subgroups. Regressions (including first GMAT scores as independent variables) were also computed for all examinees in the sample who were taking the GMAT for the second time.FIRST‐TIME EXAMINEESThe percents of first‐time examinees electing to use each method varied, but the rank order of the frequency of using each method was consistent across the subgroups. The largest proportion of first‐time examinees reported that they had prepared by reviewing the bulletin, followed in descending order by using a test preparation book not prepared by ETS, undertaking their own study of mathematics, working through an actual GMAT, and attending a test preparation course. The study also found that examinees electing to use the various methods of preparation did not differ appreciably in previous academic performance as measured by undergraduate grade point average, but did vary slightly in age and amount of work experience.Results of the multiple regression analyses based on data from first‐time examinees differed across the four subgroups. The size of the coefficients associated with each method of preparation, as well as the corresponding standard errors, varied among the four subgroups. The same was true for the interaction effects between pairs of methods. The expected difference in verbal score for a “yes” response to “Studying a test review book not published by ETS” when the effects of the other independent variables were held constant, ranged from 1.6 to 3.2 scaled score points and were significant for all four subgroups. The difference in verbal scores for “Studying the Bulletin” ranged from 1.3 scaled score points for Afro‐American/Blacks to 4.0 scaled points for Oriental/Asians. The effects of using a review book or taking a review course ranged between .4 and 1.9 points on verbal and quantitative scores. Negative effects were associated with examinees' own review of mathematics. These effects were attributed to a confounding between self‐selection and method of preparation.SECOND‐TIME EXAMINEESThe sizes of the effects associated with each method of preparation for second‐time examinees were considerably less than those obtained using data from first‐time examinees. When previous test performance was held constant, the effect of using each of the methods of preparation was small. In fact, only the effects of using a test preparation book and of attending a test preparation course were significantly different from zero.The mean GMAT scores between second‐time examinees who did and did not use each method of preparation differed inconsistently and only slightly. Additionally, the magnitude of gain over the first administration score was very similar between examinees who used and did not use each method.CONCLUSIONSThis study has indicated that differences in GMAT scores do exist among examinees using different methods of preparing for the examination. However, it was shown that when initial ability, as measured by GMAT first score, was controlled, the sizes of the effects of studying the GMAT bulletin, working through an actual GMAT, and reviewing mathematics were not significantly different from zero. The effects of the methods on GMAT scores of first‐time examinees, for whom a previous score was not available to use as a covariate, were larger. In those analyses (in which self‐reported UGPA was used as a less effective control on ability), the largest effects associated with any method over the other were about 4, 3 and 33 verbal, quantitative and total score points, respectively. However, the effects of using these methods are confounded with the characteristics of examinees who choose to use each method. The effects resulted from a combination of self‐selection and preparation. There do appear to be relationships between method of preparation and test scores. However, it must be emphasized that it does not necessarily follow that using any of the methods of preparation causes an increase in scores.
- Research Article
29
- 10.1177/016146811411600701
- Jul 1, 2014
- Teachers College Record: The Voice of Scholarship in Education
Background In 41 states, students must pass the “basic skills” portion of their licensure exam before they can be admitted into a teacher education program. Because African American test takers are roughly half as likely to pass basic skills exams on their first attempt compared to White test takers, this portion of the licensure exam is a key gatekeeper to the field and directly shapes the racial diversity of the profession. Researchers generally frame this problem in one of two opposing ways: (a) by locating the cause in skill and knowledge deficiencies of test takers or (b) by locating the cause in the cultural bias of standardized test instruments. This study looks beyond these two polarized views to conceptualize the licensure exam as a testing event that includes a nexus of cognitive and affective processes beyond the specific skills the test is designed to measure. Focus of Study The study examined the subjective and social psychological ways African American test takers experience teacher licensure testing events. This study was guided by the following research questions: (a) How do African American preservice teachers experience the licensure testing event? (b) How does race become a salient aspect of the testing event experience for African American preservice teachers? The study drew from the social psychological constructs of identity contingencies and situational cues to analyze students’ experiences in the testing event. Setting and Participants Participants in this study were 22 African American preservice teachers attending a predominantly and historically Black institution in the northeastern United States. Each of the participants took the paper format basic skills exam in either the spring 2009 or spring 2010 national administration. Research Design Drawing from culturally sensitive research practice, this study used a qualitative case study research design to explore test takers’ experiences in the testing event. Findings/Conclusions Findings illustrate how the licensure testing event can become a racialized experience for some participants through (a) interactions with test proctors and site administrators before and during examinations and (b) actions of other test takers that inadvertently signaled racial stereotypes about test preparation, intelligence, and character. Racialized experiences for participants were not based upon any specific test questions or content. Findings are discussed in light of previous research to suggest that these experiences have the capacity to produce a host of cognitive and affective states that undermine performance.
- Research Article
107
- 10.1177/0265532212442634
- Jul 9, 2012
- Language Testing
This study introduces Expectancy-value motivation theory to explain the paths of influences from perceptions of test design and uses to test preparation as a special case of washback on learning. Based on this theory, two conceptual models were proposed and tested via Structural Equation Modeling. Data collection involved over 870 test takers of College English Test Band 4 in China. A perception of assessment questionnaire was given at the beginning of a 10-week preparation period; a test preparation questionnaire was given eight weeks later. Test takers who endorsed high-stakes, instrumental test uses as the primary purpose for taking the test tended to value test taking; test takers who perceived test design positively tended to attach high importance to test taking and appeared more confident. Furthermore, higher endorsed task value and higher expectation of test success jointly contributed to greater engagement in test preparation. Knowledge of the test was also related to increased self-regulation in test preparation and more practice of test-taking skills.
- Research Article
- 10.1016/j.heliyon.2024.e40579
- Nov 22, 2024
- Heliyon
Does Perceived Test Fairness Affect Test Preparation? -- A Case Study of Duolingo English Test
- Research Article
20
- 10.1080/09588221.2019.1704788
- Jan 8, 2020
- Computer Assisted Language Learning
A fundamental requirement of language assessments which is underresearched in computerized assessments is impartiality (fairness) or equal treatment of test takers regardless of background. The present study aimed to evaluate fairness in the Pearson Test of English (PTE) Academic Reading test, which is a computerized reading assessment, by investigating differential item functioning (DIF) across Indo-European (IE) and Non-Indo-European (NIE) language families. Previous research has shown that similarities between readers’ mother tongue and the second language being learned can advantage some test takers. To test this hypothesis, we analyzed data from 783 international test takers who took the PTE Academic test, using the partial credit model in Rasch measurement. We examined two main types of DIF: uniform DIF (UDIF), which occurs when an item consistently gives a particular group of test takers an advantage across all levels of ability, and non-uniform DIF (NUDIF), which occurs when the performance of test takers varies across the ability continuum. The results showed no statistically significant UDIF (p > 0.05), but identified 3 NUDIF items out of 10 items across the language families. A mother tongue advantage was not observed. Similarity in test takers’ level of computer and Internet skills, test preparation, and language policies could contribute to the finding of no UDIF. Post-hoc content analysis of items suggested that the decrease of mother tongue advantage for IE groups in high-proficiency groups and lucky guesses of low-ability groups may have contributed to the emergence of NUDIF items. Lastly, recommendations for investigating social and contextual factors are proposed.
- Research Article
1
- 10.1002/j.2333-8504.2001.tb01856.x
- Dec 1, 2001
- ETS Research Report Series
ABSTRACTThis study investigated the extent and nature of preparation for the Pre‐Professional Skills Tests (PPST®), the reasons for preparing or not preparing, and differences in these results for White and minority‐group test takers and for middle‐class and working‐class test takers. Recent PPST test takers were surveyed. Preparation for the PPST was limited and mainly involved activities that were free or inexpensive, such as taking a sample test. The reported reasons for not preparing and the empirical correlates of measures of preparation were primarily attitudinal. Ethnic‐group and social‐class differences in the extent and nature of test preparation were minimal, but there were some differences in reported reasons and correlates of preparation, primarily less awareness of test preparation resources by White and middle‐class test takers and few correlates of test preparation for Black test takers.