Evaluation of medical education is a very complex and challenging process. The Supreme Court of the United States, in trying to define when free speech becomes pornography, noted simply that when you see it, you will know it. The proverbial blind men trying to define the elephant defined the pachyderm by describing the individual parts, without being able to se the whole picture. Miller, in his 1989 invited address1 to the Research in Medical Education group, part of the Association of American Medical Colleges, suggested a pyramid as the framework for assessment of clinical performance, the bottom representing Knowledge and the top Action. Most of the clinical assessment on which we rely currently is at the bottom of the pyramid; ie, knowledge. We use in-house testing as a way to examine resident cognitive skills, and the American Board of Pediatrics continues that approach with a written examination for certification and recertification. We pay lip service to advancing up the pyramidal model, with only some programs using standardized patients, other simulations, and directly observing resident performance in a disciplined manner. Computer technology and other resources offer an opportunity to change the way we evaluate trainees and assure the public that we are graduating competent, caring pediatricians.Evaluation should be not only summative, but must be formative as well. We always need to ask the question, “How can I improve my skills”? As educators, we should ask, “How can we improve the skills of the learners?” In addition, educational evaluation should consider assessment of other elements of the educational enterprise: the curriculum, faculty, record system, clinical experience, feedback system, self-assessment process, interpersonal and educational skills, psychosocial and humanistic skills, as well as ethical approaches to patient care, just to name a few. Thus, like the proverbial elephant, not only is evaluation a massive undertaking, it is also highly complex in nature. No single method of evaluation will suffice. Multiple approaches must be used, with often minimal to no correlation among the strategies. Two excellent reviews summarize the multiple perspectives associated with the evaluation of residency competence.2,3Research in medical education has improved greatly in the past few years. Recent studies rely on increasingly objective methods of measuring clinical competency. Control and comparison groups are more frequently used to determine if the intervention directly affects performance. Qualitative approaches offer additional ways to assess residency education and program evaluation. Newer methods of objective strategies, such as the OSCE (Objective Structural Clinical Examination), have been introduced, with some evidence of efficacy, but with remaining questions of whether this method can be the gold standard or just another way of assessing the complex nature of clinical skills.4,5 The standardized patient program, including the use of children as standardized patients, has been effective in evaluating teaching clinical skills, although some colleagues call these patients fake.6–8In this issue of the Journal of the Ambulatory Pediatric Association, Lopreiato and colleagues9 present an excellent example of an evaluation of a residency program, assessing curriculum, knowledge, and clinical skills, in an important component of pediatrician competence: health maintenance. They answered the important question of educational research: “What difference did the intervention make?” This research requires control or comparison groups, although it is often difficult to have reasonable comparison groups. Lopreiato chose an historic contemporaneous comparison group from his own institution, so as to control variability of the educational context as much as possible. He also used standardized patient mothers to assess clinical skills, real patients who bring true life experiences as mothers seeking medical care for their children. Mothers were trained to give effective feedback and evaluation to residents, while not having scripts. Their method, in our view, is important, combining reliable and valid ways to measure clinical skills. It is also a creative attempt to standardize the evaluation process, yet keep it as real as possible for the learners. They also assessed the curriculum, used multiple sources for evaluation (including a systematic chart audit), and documented improvement in clinical skills. Their methodology allows them to assess those areas with no improvement of skills, so that they can plan for future improvement. For this exemplary work, Lopreiato and colleagues received the Ray E. Helfer Award for Innovation in Pediatric Education from the Pediatric Academic Societies for 1998. Congratulations!