INTERACTIONS BETWEEN ITEM CONTENT AND GROUP MEMBERSHIP ON ACHIEVEMENT TEST ITEMS

Robert L Linn,Delwyn L Harnisch

doi:10.1111/j.1745-3984.1981.tb00846.x

Abstract

It is often argued that certain types of items are biased against some groups. Of particular concern is the possibility that non-essential characteristics of particular test items may result in misleadingly poor performance for minority and/or socioeconomically disadvantaged children. For example, when vocabulary is incidental to the skill that the items are purported to measure (e.g., an arithmetic story problem), then the use of words that are less familiar to members of one group than to another may result in a biased indication of the relative performance of the two groups. A variety of student characteristics could interact with item characteristics to affect overall performance on a test. Ethnic group membership or socioeconomic status are but two of many potentially important characteristics. Differences in motivation or in test taking anxiety could also lead to interactions with characteristics of test items. The identification and understanding of possible interactions between student characteristics and the characteristics of items used to measure student achievement could contribute to the development of improved measurement procedures. Characteristics which have the potential to distort test results by interacting undesirably with student characteristics can be investigated experimentally. Item characteristics would be systematically varied and the performance of groups differing in characteristics such as socioeconomic status or level of anxiety would be compared experimentally. A study by Medley and Quirk (1974) is an example of this approach. Medley and Quirk used altered content specifications for the general education items of the National Teacher Examinations (NTE). For two experimental forms of the examination, the proportion of items involving contemporary culture (modern items) and the proportion of items involving black cultural contributions (black items) were increased and the proportion of items dealing with classical contributions (traditional items) was reduced. The relative performance of black and white candidates was then compared on the three types of items. The black candidates did relatively better on the black and modern items than on traditional items, and, when black and modern items were compared, their relative performance was better on the black items.

Full Text