Abstract
Background: Conceptualising the Borderline candidate is one of the most difficult tasks in standard setting. However, it is also central to the process. Here we set out to develop a methodology by which the score of borderline candidates can be retrospectively calculated from the Facility index (the percentage of items answered correctly) of assessment items. Methods: We explored performance of all candidates in an academic year in one UK medical school, covering 26 separate assessments. Each assessment had previously been standard set by either Angoff or Borderline Regressions methods. We identified Borderline candidates by reviewing their performance across all assessments in their year. A student was classed as 'Borderline' if they were within 1 Standard Error of Measurement above the pass score, or below the pass score, when a variety of cut-off points were explored experimentally. We plotted the item scores of the Borderline candidates as calculated by each method in comparison with Facility for the whole cohort, and fitted curves to the resulting distributions. Results: Borderline candidate scores intercepted the self-plot of all candidate scores at two places - at a facility of 100% and a facility of 20%. These correspond to all candidates getting the item correct and all candidates guessing the outcome. We observed a strong curvilinear distribution showed by Borderline candidates compared to the whole cohort. This relationship was well described by an exponential of the form y ≈ C·exp(F·x), where y is the Facility of Borderline candidates on that Item, x is the observed Item Facility of the whole cohort, and C and F are constants. We found C and F had similar values under different conditions. Using the typical values for C and F and the observed cohort facility, we could predict the probable Facility for Borderline candidates over the test: in other words, we could calculate the appropriate cut score for Borderline candidates. Differentiating the equation indicates where the assessment ought to be most sensitive. Conclusions: This approach can be used to standard-set assessments in their entirety when they are low stakes or norm referenced, in preference to Cohen methods. Where Cohen methods are based on the performance of one candidate (or a very small number of candidates), this exponential method is based on all candidates and all items and is therefore more robust. In high stakes assessments, it can be used to correct values where the Facility is very different from the standard-set value, and its use in this context for the UK General Medical Council proposed national exam. It could also be used to standard set novel items such as Very Short Answer formats, where standard setting panels are unfamiliar with the expected performance of these items.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.