Peer assessment has been increasingly recommended as a way to evaluate the professional competencies of medical trainees. Prior studies have only assessed single groups measured at a single timepoint. Thus, neither the longitudinal stability of such ratings nor differences between groups using the same peer-assessment instrument have been reported previously. Participants were all members of 2 consecutive classes of medical students (n = 77 and n = 85) at the University of Rochester School of Medicine and Dentistry who completed Years 2 and 3 of medical school consecutively. All participants were evaluated by 6-12 classmates near the end of both Years 2 and 3. Main outcome measures were mean numerical ratings on peer-assessed scales of professional work habits (WH) and interpersonal attributes (IA). Both scales had high internal consistencies in both years (Cronbach's alpha 0.84-0.94). The IA and WH scales were moderately correlated with one another (r = 0.36 in Year 2, r = 0.28 in Year 3). Year 2 scores were predictive of Year 3 scores for both scales (WH: r = 0.64; IA; r = 0.62). Generalisability and decision analyses revealed that 1 class was consistently more discriminating with the WH scale, while the other was more discriminating with of the IA scale. Depending on the class, year and scale, the number of raters needed to achieve a reasonable reliability ranged between 7 and 28. Although Year 3 peer ratings were consistently higher than Year 2 peer ratings for both WH and IA, individual scores were highly correlated across the 2 years, despite the fact that different individuals were chosen as peer raters. Abilities appear to be stable between Years 2 and 3 of medical school. Groups may differ in their ability to discriminate different kinds of skills. Generalisability analysis can be used to discover these patterns within groups.