Abstract

Two independent replications of a sequence of simulations were carried out to aid in the diagnosis and interpretation of equating differences found between representative (random) and matched (nonrandom) samples for three commonly used conventional observed-score equating procedures (Tucker, Levine equally reliable, and chained equipercentile) and one item response theory (IRT) based equating procedure (three-parameter logistic, or 3PL, model true-score equating). The results support the theory-based prediction that observed-score equating methods such as Tucker and chained equipercentile are more affected by sample variation than are a true-score equating method (3PL IRT) and an observed-score method based on true- score assumptions (Levine equally reliable). These results further suggest that matching equating samples on the basis of a fallible measure of ability may not be advisable for any equating method studied, except possibly the Tucker method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call