Introduction/Background The National Board of Osteopathic Medicine administers the Comprehensive Osteopathic Medical Licensing Examination (COMLEX-USA) Level 2-Performance Evaluation (PE) to all graduating osteopathic medical students as part of the licensure pathway in the United States. Candidates rotate through 12 standardized patient (SP) cases. A committee of osteopathic physicians creates a checklist (CL) for the SP to document the history and physical after the encounter. Although performance based assessments have become common in both education and high-stakes assessment, case and checklist development remain inadequately described.1-4 Checklists are often based on expert consensus, as there have been few evidence based performance criteria available for consideration. There is emerging literature in patient care that may help to improve checklist development procedures. The purpose of this study is to compare checklists developed through expert consensus to ones developed based on review of the literature and evidence based practices. We hypothesize that checklist items based on evidence will contribute to a more reliable checklist. Methods Two independent groups of case developers worked to develop a new case and checklist for the COMLEX-USA Level 2-PE. Each group contained five to six osteopathic physicians, a standardized patient trainer and a standardized patient. Each checklist item was classified by category or system. For history items the categories were: History of Present Illness (HPI), associated symptoms related to the chief complaint; Review of systems unrelated to the chief complaint; Patient History, including past medical history, surgical history, medications, family history and allergies; Lifestyle factors, including substance use, occupation, etc. Physical examination items were classified as systems: HEENT (Head/Eyes/Ears/Nose/Throat); Cardiovascular; Pulmonary; Gastrointestinal (GI); Genito-urinary (GU); Neurological (Neuro) and Musculoskeletal. This classification helps balance the checklist for content. Each group was then asked to identify a rationale for selecting each item for the checklist. This system for rationalization was loosely based on the principles of SORT taxonomy: A = CL item based on evidence based information (e.g. positive predictive values and likelihood ratios of symptoms and signs with clear patient outcomes); B = CL item based on guidelines for patient care by a respected body (United States Preventive Services Task Force, American College of Osteopathic Family Physicians, American Osteopathic Association, etc.); C = CL based on panel majority consensus. The final checklist (all items from both groups) was placed into a pretesting station in the COMLEX-USA Level 2-PE. Results A total of 25 items (18 History (HX) and 7 physical examination (PE) items) were suggested for the CL (CL “1”). Eleven items were common to both groups (9 HX and 2 PE, CL “2”). Seven items were classified as items with clear patient outcomes based on the review of literature (Category A above, all HX items, CL “3”). Three hundred seventy five students were exposed to the case during the 2012-2013 test cycle. Each student received a checklist score for each of the three developed checklists and the reliability of the checklists was examined. Cronbach alpha reliability analysis demonstrated that CL 1 and CL 3 performed similarly (0.38 and 0.35). CL 2 performed the worst (0.12). Performance on the three CL was also compared with the students’ performance on the post encounter SOAP note for this case using Pearson correlation coefficients (N = 375 Prob > r under H0: Rho=0). None of the correlations were significantly different from zero. Conclusion Evidence based review may result in a checklist that is significantly shorter than one developed by a small cohort of physicians without loss of reliability. The longer checklists did not correlate better with student performance on a post encounter note. More research on development of examination materials is warranted.