THE PROBLEM of item selection is complicated by the fact that each item*will change in difficulty by some unknown amount as time elapses. If no ad ditional training is given, the item will probably in crease in difficulty as forgetting proceeds. If ad ditional training is given, as is typically the case with students taking the firstsemester of a one-year introductory course, some items may conceivably decrease in difficulty. In either case, the amount of change is unknown, and, indeed, sometimes the direction of change, making it difficult to apply ra tional techniques of item selection. The measurement of retention is particularly sensitive to the difficulty of the task or item; if the task is too difficult, performance may by uniformly low, in which case, differences between groups will be masked. In laboratory experime ntal work on learning, the savings method in relearning provides the most sensitive measure of retention and has been generally adopted wherever macimal sensitiv ity is desired. Several writers (e.g., Cronbachand Warrington (1), Guilford (2), Gulliksen (3), and Lord (4) , have dealt with the relation between the ability of the group tested and the difficulty of the items to be used in constructing a sensitive test. Lord (4), for example, points out that a tes t with maximal dis criminative power for examinees at a given ability level should have all items of the same diffic ulty, such that half of the examinees at the given ability level will pass the item and half will fail it. In a test-retest study designed to measure course reten tion, we may consider that we are comparing test performance at two levels of ability: before and af ter forgetting. Therefore, knowledge of the dis criminative power of an item in the initial test pro vides no clear indication of its discriminative power in the re-test, unless the amount of forgetting (i. e., the new ability level) is known in advance. But, of course, the amount of forgetting is exactly what we are trying to find out. This problem arose during an experimental com parison of two methods of teaching general psychol ogy?a conventional course with three class meet ings per week and a self-directed study course with about one class meeting per week. It was desired to measure retention of course materials a f t er an interval of 15 months, a situation in which, unfor tunately, retention may be expected to be low and the importance of the sensitivity of the instrument looms large. Having no basis for a rational selection of items at an optimal difficulty or discrimination level, it was decided to construct the test in such a way as to provide some information for future experiments of this kind. This approach had the advantage of hedging our bets. By using items of varying but known discriminative power and difficulty in the pre-test, it appeared moderately certain that some of them would be useful in the post-test. The present report deals, therefore, with two hypotheses: 1. Items that shift in difficulty toward the opti mal level of difficulty will become more discrimin ating with the passage of time, and items that shift away from the optimal level of difficulty will become less discriminating. 2. In a situation where forgetting occurs, items that were initially much easier than the optimal dif ficulty level will shift toward the optimal level and will consequently have greater discriminative pow er in the re-test than items that were initially clos er to the optimum in difficulty. The first hypothesis is more general, but has the limitation that it cannot serve as the basis for item selection until the re-test has been administer ed. The second hypothesis requires an assumption about the direction of change in ability in the popu lation being tested, and with that restriction it can provide a guide to the selection of items for the re test on the basis of initial test results.
Read full abstract