Context and setting Instead of simply assessing residents' data-gathering skills using traditional standardised patient (SP) checklists, we sought also to assess their ability to anticipate, elicit and interpret the physical findings associated with the main diagnostic possibilities relevant to given cases. Why the idea was necessary Diagnostic accuracy is maximised by having both clinical signs and diagnostic hypotheses in mind when performing a physical examination (co-selection); having a diagnosis in mind focuses attention on specific signs and vice versa. In contrast, if one is simply being thorough in a mechanical fashion, the number of signs perceived, even when actually present, decreases significantly. Thus, physical examination (PE) checklist scores alone are not sufficient measures of clinical reasoning or data-gathering skills. What was done We developed a procedure to assess residents' ability to conduct a hypothesis-driven PE. Six cases were constructed, each providing the resident with a brief history and 2 plausible diagnoses, such as a 35-year-old woman with shoulder pain (etc.) that could represent either a rotator cuff tendonitis or an adhesive capsulitis. For each case, residents were asked to: list the expected positive PE findings for each diagnosis; examine an SP who simulated the findings of 1 of the diagnoses and noted the accuracy of the PE manoeuvres, and document their findings and working diagnosis. Evaluation of results and impact A total of 59 Year 1 and 2 internal medicine residents each saw 3 of the 6 cases as part of a required formative assessment of their clinical skills. On average, residents anticipated 37% of the positive findings related to each of the 2 diagnoses for each case. They then elicited an average of 60% of the potentially positive findings for each diagnosis, with two-thirds of these PE manoeuvres (65%) executed correctly by these residents. Stated negatively, about one-third of the PE manoeuvres (35%) were incorrectly executed. Looking only at the findings that would effectively discriminate between the 2 competing diagnoses in each case, residents anticipated about half the discriminating findings (52%) and elicited 67% of these findings, with a quarter of these discriminating PE manoeuvres (25%) executed incorrectly. Residents documented 26% of the elicited findings (25% of the discriminating findings). The mean diagnostic accuracy for the final working diagnosis was 54% (range 12–92%). These residents had only a limited number of anticipated findings in mind before they conducted the PE, representing about a third of the possible findings. They did attempt to elicit more findings than those they had anticipated, but about a third of their PE manoeuvres were executed incorrectly and only about a quarter of the findings were correctly documented. Limited anticipation, incorrect manoeuvres and poor documentation are likely contributors to the overall low diagnostic accuracy observed (about 50%). These findings suggest that assessing the ability to anticipate and correctly elicit and interpret findings specific to diagnostic alternatives provides a useful profile of resident skills that can help better focus instructional interventions and construct more detailed assessment reports.