Abstract

Introduction: It is known that test-centered methods for setting standards in knowledge tests (e.g. Angoff or Ebel) are problematic, with expert judges not able to consistently predict the difficulty of individual items. A different approach is the Cohen method, which benchmarks the difficulty of the test based on the performance of the top candidates.Methods: This paper investigates the extent to which Ebel (and also Cohen) produces a consistent standard in a knowledge test when comparing between adjacent cohorts. The two tests are linked using common anchor items and Rasch analysis to put all items and all candidates on the same scale.Results: The two tests are of a similar standard, but the two cohorts are different in their average abilities. The Ebel method is entirely consistent across the two years, but the Cohen method looks less so, whilst the Rasch equating itself has complications – for example, with evidence of overall misfit to the Rasch model and change in difficulty for some anchor items.Conclusion: Based on our findings, we advocate a pluralistic and pragmatic approach to standard setting in such contexts, and recommend the use of multiple sources of information to inform the decision about the correct standard.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.