English Items Research Articles

The purpose of this study was to evaluate the equivalence of two translated tests using statistical and judgmental methods. Performance differences for a large random sample of English- and French-speaking examinees were compared on a grade 6 mathematics and social studies provincial achievement test. Items displaying differential item functioning (DIF) were flagged using three popular statistical methods—ManteTHaenszel, Simultaneous Item Bias Test, and logistic regression—and the substantive meaning of these items was studied by comparing the back-translated form with the original English version. The items flagged by the three statistical procedures were relatively consistent, but not identical across the two tests. The correlation between the DIF effect size measures were also strong, but far from perfect, suggesting that two procedures should be used to screen items for translation DIF. To identify the DIF items with translation differences, the French items were back-translated into English and compared with the original English items by three reviewers. Two of seven and six of 26 DIF items in mathematics and social studies respectively were judged to be nonequivalent across language forms due to differences introduced in the translation process. There were no apparent translation differences for the remaining items, revealing the necessity for further research on the sources of translation differential item functioning. Results from this study provide researchers and practitioners with a better understanding of how three popular DIF statistical methods compare and contrast. The results also demonstrate how statistical methods inform substantive reviews intended to identify items with translation differences.

Read full abstract

ABSTRACTIn order to meet the needs of the Test of English as a Foreign Language (TOEFL®) constituencies, the TOEFL program is sponsoring a development project known as TOEFL 2000. Drawing from current linguistic theory and models of communicative competence, it is anticipated that the new test or test battery developed by the TOEFL 2000 project will likely be designed to test all four language skills — reading, writing, listening, and speaking — in an integrated fashion. However, one compromise level or position on integration of skills is one in which reading and writing would be tested together, and listening and speaking also tested together. It is also assumed that the test will largely be performance‐based, meaning a substantial portion of the items on the test will likely be constructed‐response items, and an examinee's score on such items will be in one of multiple ordered categories.Two groups of item response theory (IRT) models have been developed to calibrate items with multiple ordered categories (i.e., polytomously scored items): (a) the partial credit model (Masters, 1982) and the generalized partial credit model (Muraki, 1992); and (b) the graded response model (Samejima, 1969,1972). These models have been used jointly with the dichotomous three parameter logistic (3PL) IRT model to concurrently calibrate dichotomously and polytomously scored items for the National Assessment of Educational Progress (NAEP). However, the performance of these polytomous IRT models and the concurrent calibration of dichotomous and polytomous scored items have not been investigated with data from the TOEFL examinee population.The purpose of this study was to obtain a good understanding of the performance of a combination of dichotomous and polytomous IRT models with TOEFL data. TOEFL Vocabulary and Reading Comprehension and Test of Written English (TWE®) items, and TOEFL Listening Comprehension and Test of Spoken English (TSE®) items were concurrently calibrated using a combination of the generalized partial credit model and the 3PL IRT model. The two sets of combined items were also concurrently calibrated using a combination of the graded response model and the 3PL IRT model.The results of this study indicate that data from a reading/writing combination made up of the TOEFL Vocabulary and Reading Comprehension section and the TWE were reasonably well fit by a combination of the 3PL and generalized partial credit models or 3PL and graded response models. In a similar fashion, data for a listening/speaking combination made up of the TOEFL Listening Comprehension section and selected tasks from the TSE were also reasonably well fit by the 3PL/generalized partial credit and 3PL/graded response model combinations.A variety of comparisons across the generalized partial credit and graded response models seem to indicate some preference for using the generalized partial credit model when PARSCALE is used as the calibration program. The results of this study provide useful information about test construction and item calibration procedures that might later be used for the TOEFL 2000 project.

Read full abstract

English Items Research Articles

Related Topics

Articles published on English Items

Examining Language in Context: The Need for New Research and Practice Paradigms in the Testing of English-Language Learners

Can ESP be tested by EGP

ASSESSING HANDWRITING SPEED OF CHILDREN BILITERATE IN ENGLISH AND CHINESE

Shona-English code-mixing in the speech of students at the University of Zimbabwe

Code-switching and lexical borrowing: Which is what in Ghanaian English?

"Mixed" Etymologies of Middle English Items in OED3: Some Questions of Methodology and Policy

Asymmetric activation of number codes in bilinguals: further evidence for the encoding complex model of number processing.

Making phonology functional: What do I do first? Shelley L. Velleman. Boston: Butterworth-Heinemann, 1998. Pp. 228.

Ghanaianisms

Translation and validation of Caregiving Satisfaction Scale into Korean.

A Phonological Account For The Cross-Language Variation In Working Memory Processing

Subject Searching in Online Catalogs Including Spanish and English Material

Where is the hierarchy of academic self-concept?

Using Statistical and Judgmental Reviews to Identify and Interpret Translation Differential Item Functioning

ON SENTENTIAL NEGATION AND THE LICENSING OF NEGATIVE POLARITY ITEMS IN ENGLISH AND JAPANESE: A MINIMALIST APPROACH

Library Instruction and Information Literacy–1997

Dictionary of Caribbean English Usage

Ironic Context-Free Ironies in Thai as Conventionalized Implicatures

Library Instruction and Information Literacy–1996

CONCURRENT CALIBRATION OF DICHOTOMOUSLY AND POLYTOMOUSLY SCORED TOEFL ITEMS USING IRT MODELS

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

English Items Research Articles

Related Topics

Articles published on English Items

Examining Language in Context: The Need for New Research and Practice Paradigms in the Testing of English-Language Learners

Can ESP be tested by EGP

ASSESSING HANDWRITING SPEED OF CHILDREN BILITERATE IN ENGLISH AND CHINESE

Shona-English code-mixing in the speech of students at the University of Zimbabwe

Code-switching and lexical borrowing: Which is what in Ghanaian English?

"Mixed" Etymologies of Middle English Items in OED3: Some Questions of Methodology and Policy

Asymmetric activation of number codes in bilinguals: further evidence for the encoding complex model of number processing.

Making phonology functional: What do I do first? Shelley L. Velleman. Boston: Butterworth-Heinemann, 1998. Pp. 228.

Ghanaianisms

Translation and validation of Caregiving Satisfaction Scale into Korean.

A Phonological Account For The Cross-Language Variation In Working Memory Processing

Subject Searching in Online Catalogs Including Spanish and English Material

Where is the hierarchy of academic self-concept?

Using Statistical and Judgmental Reviews to Identify and Interpret Translation Differential Item Functioning

ON SENTENTIAL NEGATION AND THE LICENSING OF NEGATIVE POLARITY ITEMS IN ENGLISH AND JAPANESE: A MINIMALIST APPROACH

Library Instruction and Information Literacy–1997

Dictionary of Caribbean English Usage

Ironic Context-Free Ironies in Thai as Conventionalized Implicatures

Library Instruction and Information Literacy–1996

CONCURRENT CALIBRATION OF DICHOTOMOUSLY AND POLYTOMOUSLY SCORED TOEFL ITEMS USING IRT MODELS