Abstract

Every major textbook in measurement discusses the improvement of a test through the use of item statistics. Most students taking a course in test construction remember that item analysis can be useful in item selection decisions. Unfortunately, they tend to forget that item analyses can also be useful for item revision. Too frequently a person doing an item analysis of a multiple-choice test fails to go beyond computing the item difficulty and discrimination indices. When only this superficial analysis is completed, reasons underlying any item failures can not be discerned and item revision is difficult. It is when the responses to each of the foils have been examined that item revision can be accomplished most effectively. The purpose of this paper is to study the value of using a complete item analysis in rewriting items that have been shown to lack appropriate discrimination power. The researchers were interested in seeing whether it was more efficient to rewrite poor items than to write new items to improve the discrimination power of a test. METHOD About 600-700 students take an introductory educational psychology course at Michigan State University each term. For many years, a part of the evaluation procedure has been multiple-choice examinations. The tests are revised each quarter. Past revision have been conducted primarily by taking previous exam questions that have discriminated well and are still of appropriate content. These were used together with new items written by the instructors. Seldom were individual items rewritten through revision of the foils and even when that was done, no systematic follow-up of these rewritten items was undertaken to determine whether their discriminating powers improved. An instructor familiar with the course content was asked to review a 60-item test given the previous quarter, and select items that were appropriate in content for the test being prepared. A complete item analysis had been conducted on this test showing the difficulty and discrimination indices, as well as the percent of people in the upper and lower 27% who responded to each alternative.1 Thirty-two items in this test were found to be still appropriate in content. These 32 items were then examined and 18 were found to be acceptable for use without revision, 14 being in need of revision. These 14 items were revised by an instructor. The major revisions were accomplished by looking at the item analysis data on the foils and revising those foils which were not pulling in the proper direction, that is, those which were not more attractive to the lower group than the upper group. The total time needed 'The discrimination index used was the D index. See Findley (1956) and Englehart (1965) for a more thorough discussion.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.