ABSTRACTSeveral studies have shown that, on average, women perform slightly better than men on free‐response tests, while men perform slightly better on multiple‐choice tests, Studies of the College Board Advanced Placement (AP) Examinations have revealed a similar phenomenon. For almost all AP Examinations, men average better on both parts of the tests, but sex differences on the free‐response parts are almost always less, and for some tests they are nonsignificant.Two AP Examinations, U.S. History and European History, were selected for study because sex differences on the free‐response parts were nonsignificant while sex differences on the multiple‐choice parts were large. Random samples of free‐response booklets were drawn from the 1986 administrations of both examinations. A number of ratings and analyses were made of the responses: English composition quality, historical content, responsiveness, factual errors, handwriting quality, neatness, and number of words written. All variables were then used to predict the free‐response scores. Several significant predictors were observed: the AP multiple‐choice score, historical content, English composition quality, and the number of words written.Predictor variables were classified with respect to their degree of relevance to the measurement construct. The historical content ratings and multiple‐choice history scores were considered to be directly relevant, English composition quality and words written to be indirectly relevant, and neatness and handwriting to be irrelevant. For the two free‐response scores combined, the indirectly relevant variables, in combination, predicted about as well as the directly relevant variables. For the Part A free response, the indirectly relevant variables predicted best, while for the Part B free response, the directly relevant variables predicted best. Handwriting and neatness did not predict free‐response scores well at all.The regression equations developed were also used to estimate over‐ or underprediction by sex of free‐response scores from the AP multiple‐choice score alone and from multiple predictors. As expected, the AP multiple‐choice score underpredicted female performance on the free‐response part of the test and overpredicted male performance. For the AP U.S. History Examination, the over/underprediction effect size was reduced to a nonsignificant level when the English composition quality of the free responses was added to the predictor set. For European History, the over/underprediction effect size was also reduced to a nonsignificant level by the addition of English composition quality, although the reduction of the effect size was not as dramatic as that for U.S. History. No other predictor variables contributed importantly to the reduction of effect sizes.To develop further insight into this test format phenomenon, AP files for the sampled cases were matched with College Board Admissions Testing Program (ATP) data to obtain Scholastic Aptitude Test (SAT) scores and English Composition Test scores. In U.S. History, the SAT‐verbal, Test of Standard Written English (TSWE), and English Composition Test (ECT) scores made significant independent contributions to the prediction of AP free‐response scores over and above that possible using the AP objective (multiple‐choice) score, especially for Part A. In European History, SAT‐verbal scores made significant independent contributions, although TSWE and ECT scores did not.This study suggests that format effects are real and cannot be attributed to bias in scoring or to totally irrelevant variables. When scoring was conducted analytically with a focus on historical content, no sex differences were observed in the free‐response portions. This is the same result observed for the regular administration readings, which are graded holistically and by readers different from those used for this study. The sexes appear to differ in how they respond to two legitimate but different ways of assessing history skills. The study raises the question of how much influence basic skills and aptitude should have on the outcomes of assessments in specific areas of achievement.
Read full abstract