Sampling knowledge and understanding: how long should a test be?

Richard F Burton

doi:10.1080/02602930600679589

Abstract

Many academic tests (e.g. short‐answer and multiple‐choice) sample required knowledge with questions scoring 0 or 1 (dichotomous scoring). Few textbooks give useful guidance on the length of test needed to do this reliably. Posey's binomial error model of 1932 provides the best starting point, but allows neither for heterogeneity of question difficulty and discriminatory power nor for students' uneven spread of knowledge. Even with these taken into account, it appears that tests of 30–60 items, as commonly used, must generally be far from adequate. No exact test length can be specified as ‘just sufficient’, but the tests of 300 items that some students take are not extravagantly long. The effects on reliability of some particular test forms and practices are discussed.

Full Text