By the time children in the United States reach seventh grade, half declare no interest in science (Office of Science and Technology Policy, 1991). Among girls, this disinterest appears to be particularly pronounced (S. Johnson, 1987; Jones, Mullis, Raizen, Weiss, & Weston, 1992). At the same time, girls’ and boys’ performances on standardized tests of science achievement begin to diverge with girls falling behind boys. This fact is well supported by numerous large-scale studies such as the International Association for the Evaluation of Educational Achievement or IEA (1988), the National Assessment of Educational Progress (NAEP) 1970-1986 (Mullis & Jenkins, 1988), and the British Columbia Science Assessments (Bateson & Parsons-Chatman, 1989). In the science classroom, however, girls perform as well, or better than, boys (Maccoby & Jacklin, 1974). Therefore, standardized tests are thought to under-predict girls’ science achievement (Linn, 1991). Although this gender disparity has been attributed to several factors, there is considerable concern that the difference may be an artifact of the method of measurement (Bateson & Parsons-Chatman, 1989; Bolger & Kellaghan, 1990). That is, there is something about the test itself that puts girls at a disadvantage. Girls’ lower test scores, in turn, are thought to undermine their self-perceptions of competence, leading to their disinterest in science and eventual drop from the science “pipeline” (Oaks, 1990; Rosser et al., 1989). Included in the current reform rhetoric is the need to change the method by which we evaluate students’ achievements. To do so will “open gates of opportunity rather than close them off” (National Commission on Testing and Public Policy, 1990, p. x). The belief seems to be that by replacing traditional assessment methods with new alternative methods such as performance-based assessments, the gender bias in testing may be eliminated (Jenkins & MacDonald, 1989; National Center for Improving Science Education, 1989). This article begins by reviewing what we know about gender differences on traditional tests of science achievement and what is hoped to be gained by changing to performance-based assessments. Then, as an initial look at the effect of new forms of testing on males’ and females’ science achievement, their scores on performance-based assessments are compared. Finally, these findings are discussed in the context of science reform.