BackgroundFair assessment of clinical trainees' competence is crucial to both education and patient safety. Our previous research indicated that assessment judgments might be biased by assessors' recent experiences of judging preceding performances. In two studies, we aimed to assess whether such an effect might bias scores towards (cognitive assimilation) or away from (contrast effect) preceding performances. Additionally we examined whether inducement to consider typical performances could mitigate the effect (robustness), and whether confidence in judgments predicted susceptibility to the effect (insight). MethodsIn two separate studies, consultant doctors were randomised to two groups in an internet-based experimental design. Participants viewed identical, scripted, validated videos of doctors in their first year of training. In study one, participants viewed either three good performances or three poor performances before viewing three borderline performances. In study two, participants viewed six performances in either ascending (two poor, two borderline, two good) or descending (two good, two borderline, two poor) order. Competence scores were compared between groups with 6-point Mini-CEX assessment scales. Additionally, in study two, 7-point confidence ratings, and percent-better ratings (judgments of the proportion of a cohort expected to outperform the currently considered performance) were collected. FindingsWe included 90 consultant doctors (41 in study one, 49 in study two). Both studies showed contrast effects—ie, scores were biased away from preceding performances. In study one, the mean score for borderline videos was 2·7 out of 6 (SD 0·69, 95% CI 2·4–3·0) when preceded by good videos compared with 3·4 out of 6 (0·55, 3·1–3·7, p=0·001) when preceded by poor videos. Failing scores were allocated by 55% and 24% of participants in these groups, respectively (p<0·001). In study two, percent-better ratings did not mitigate the effect (mean ratings 43·4 [13·7, 95% CI 38·4–48·5] vs 57·4 [104, 52·5–62·3] for poor-to-good and good-to-poor groups, respectively; p<0·001). Confidence was unrelated to susceptibility. InterpretationAssessors' scores repeatedly showed moderate-strength contrast effects sufficient to alter pass or fail decisions. Continued susceptibility to this effect despite use of the percent-better rating (designed to stimulate long-term memory of other trainees) indicates robustness. Equal susceptibility to the effect for participants reporting both high and low confidence in their scores indicates lack of insight. This potentially unconscious influence might importantly affect the fairness of clinical examinations. FundingThe Academy at UHSM, Association for the Study of Medical Education.