Abstract

ABSTRACTThe results of equating studies are supportive of the use of IRT methods for TOEFL equating, but there remains a discrepancy between the assumptions of the equating and the diversity of the population served. The IRT model assumes that, except for chance effects, the performance of individuals is entirely accounted for by their status on a single latent proficiency variable. But the TOEFL candidate population appears to be sufficiently diverse that different groups might exist, each with its own latent variable. Informal studies indicate that such is the case. The purpose of the study reported here was to identify such groups, if any, and to explore the implications of the results.The basic and very surprising outcome of this study was the finding that a single factor accounted for virtually all the proportions of joint item successes. This result was obtained for all three TOEFL sections. Also for each section, proportions of joint item success were proportional to the products of the item difficulties. Both of these results indicate that latent group effects are small.The search for groups also consisted of assessing the relative importance of two factors in accounting for examinees' patterns of item success. For an individual, this relative importance was expressed by the angle in a polar coordinate plot of pairs of coefficients of regression of item scores on factor loadings. A large latent group would appear as a mode in a frequency distribution of angles; multiple modes would indicate several latent groups. But a rather smooth distribution of angles for the total group was found, with some ordering, but a great deal of overlap, of the distributions by national origin and language background. Neither national origin nor native language accounted for more than 25% of the variation in the angles.The results of the present study are consistent with the use of section equating using item response theory and with the use of the restrictive assumption of proportionality of item response curves; the probability of correct response is a product of two components, a function of ability and an item constant. This is a single parameter model that could serve as a basis for item calibration and equating. The model would lead to simplified procedures. The result also suggests that, when assembling operational tests, using those items for which the proportion of joint successes is most nearly proportional to their difficulties would reduce latent group effects.The Test of English as a Foreign Language is offered for use by educational institutions with the expectation that the scores from TOEFL examinees are informative on a single scale, essentially without reference to specific examinee characteristics. The present study provides evidence consistent with that expectation. One should be prepared to examine small effects, go outside the domain of measurement, or go outside the typical range of TOEFL examinees to seek substantial departures from the unitary hypothesis.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.