This study examined 40 kindergarten and first-grade teachers' abilities to use a rubric to rate 20 first-grade writing samples. Twenty-three of the teachers were trained to interpret the scoring dimensions of the rubric, while seventeen were not. The purpose of the study was to investigate whether training raters to interpret the scoring dimensions on a rubric would increased reliability. Generalizability theory was used to enable the researchers to estimate reliability by examining multiple sources of errors and their possible interactions simultaneously. Because the teachers were nested within training conditions (schools), ANOVA was run using a partially-nested design. Variance estimates and generalizability coefficients were calculated from the ANOVA results for each scoring dimension and the total raw score of the rubric. Variance estimates were calculated for each school (trained vs. untrained) on each of the scoring dimensions. Initial results indicated no increase in reliability due to training. Raters were nested within training, and because of this, it was impossible to separate rater main effect from the rater by-training effect. To further examine the nature of variation in the rater within-training term, separate analyses were performed on the data for trained and untrained raters. Data indicated a greater amount of variation in four of six scoring categories for the untrained raters. When comparing the two groups, the same pattern was also present in the total raw score of the writing samples. Less variance suggests that training increased raters' abilities to reliably interpret these scoring items. These findings have implications for rubric design and teacher training in the use of portfolio assessment.