Abstract
Software that easily helps higher education instructors to remove poor quality items and set appropriate grade boundaries is generally lacking. To address these challenges, the SmartStandardSet system provides a graphical-user interface for removing defective items, weighting student scores using a two-parameter model IRT score analysis, and a mechanism for standard-setting. We evaluated the system through a series of six interviews with teachers and six focus groups involving 19 students to understand how key stakeholders would view the use of the tool in practice. Generally, both groups of participants reported high levels of feasibility, accuracy, and utility in SmartStandardSet’s statistical scoring of items and score calculation for test-takers. Teachers indicated the data displays would help them improve future test items; students indicated the system would be fairer and would motivate greater effort on more difficult test items. However, both groups had concerns about implementing the system without institutional policy endorsement. Students specifically were concerned that academics may set grade boundaries on arbitrary and invalid grounds. Our results provide useful insights into the perceived benefits of using the tool for standard setting, and suggest concrete next steps for gaining wider acceptance that will be the focus of future work.
Highlights
Grade boundaries for tests are usually related to the proportion of items answered correctly
We describe a software system, SmartStandardSet, that automates IRT analysis of multiple-choice question (MCQs) test items, calculates weighted scores for students, and allows for grade boundaries to be set according to standards-based judgements by higher education instructors
Instructors generally perceived that the scores created by SmartStandardSet were an accurate way of determining scores and understood how SmartStandardSet removed the guesswork in creating a credible statistically informed score
Summary
Grade boundaries for tests are usually related to the proportion of items answered correctly This is potentially misleading because test difficulty or easiness is not considered (e.g., easy tests create high scores). There may be resistance from both students and lecturers in accepting IRT-based scoring in environments where it is not approved by policy This paper addresses both gaps by: 1) describing a newly developed prototype tool, SmartStandardSet, for performing test quality evaluation and standard setting, and 2) conducting an exploratory pilot evaluation from the perspective of intended stakeholders concerning the utility, feasibility, accuracy, and propriety of the system. This preliminary evaluation gauges the acceptance of a potentially major change in how multiple-choice tests are evaluated and prepared for grading; it is warranted and provides useful insights despite its small-scale
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.